Articles

  • A succesful Git branching model considered harmful

    When people start to use git and get introduced to branches and to the ease of branching, they may do couple of Google searches and very often end up on a blog post about A successful Git branching model. The biggest issue with this article is that it comes up as one of the first ones in many git branching related searches when it should serve as a warning how not to use branches in software development.

    What is wrong with “A successful Git branching model”?

    To put it bluntly, this type of development approach where you use shared remote branches for everything and merge them back as they are is much more complicated than it should be. The basic principle in making usable systems is to have sane defaults. This branching model makes that mistake from the very beginning by not using the master branch for something that a developer who clones the repository would expect it to be used, development.

    Using individual (long lived) branches for features also make it harder to ensure that everything works together when changes are merged back together. This is especially pronounced in today’s world where continuous integration should be the default practice of software development regardless how big the project is. By integrating all changes together regularly you’ll avoid big integration issues that waste a lot of time to resolve, especially for bigger projects with hundreds or thousands of developers. This type of development practice where every feature is developed in its own shared remote branch drives the process naturally towards big integration issues instead of avoiding them.

    Also in “A successful Git branching model” merge commits are encouraged as the main method for integrating changes. I will explain next why merge commits are bad and what you will lose by using them.

    What is wrong with merge commits?

    “A successful Git branching model” talks how non-fast-forward merge commits can be thought as a way to keep all commits related to a certain feature nicely in one group. Then if you decide that a feature is not for you, you can just revert that one commit and have the whole feature removed. I would argue that this is a really rare situation that you revert a feature or that you even get it done completely right on the first try.

    Merges in git very often create additional commits that begin with the message that looks like following: “Merge branch 'some-branch' of git://git.some.domain/repository/”. That does not provide any value when you want to see what has actually changed. You need go to the commit message and read what happens there, probably in the second paragraph. Not to mention going back in history to the branch and trying to see what happens in that branch.

    Having non-linear history also makes git bisect harder to do when issues are only revealed during integration. You may have both of the branches good individually but then the merge commit fails because your changes don’t conflict. This is not even that hard to encounter when one developer changes some internal interface and other developer builds something new based on the old interface definition. These kind of can be easy or hard to figure out, but having the history linear without any merge commits could immediately point out the commit that causes issues.

    Something more simple

    The cactus model
    Figure 1: The cactus model

    Let me show a much more simple alternative, that we can call the cactus model. It gets the name from the fact that all branches branch out from a wide trunk (master branch) and never get merged back. Cactus model should reflect much better the way that comes up naturally when working with git and making sure that continuous integration principles are used.

    In figure 1 you can see the principle how the cactus branching model works and following sections explain the reasoning behind it. Some principles shown here may need Gerrit or similar integrated code review and repository management system to be fully usable.

    All development happens on the master branch

    master branch is the default that is checked out after git clone. So why not also have all development also happen there? No need to guess or needlessly document the development branch when it is the default one. This only applies to the central repository that is cloned and kept up to date by everyone. Individual developers are encouraged to use local branches for development but avoid shared remote branches.

    Developers should git rebase their changes regularly so that their local branches would follow the latest origin/master. This is to make sure that we do not develop on an outdated baseline.

    Using local branches

    Cactus model does not to discourage using branches when they are useful. Especially an individual developer should use short lived feature branches in their local repository and integrate them with the origin/master whenever there is something that can be shared with everyone else. Local branches are just to make it more easy to move between features while commits are tested or under code review.

    Figures 2 and 3 show the basic principle of local branches and rebases by visualizing a tree state. In figure 2 we have a situation with two active local development branches (fuchsia circles) and one branch that is under code review (blue circles) and ready to be integrated to origin/master (yellow circles). In figure 3 we have updated the origin/master with two new commits (yellow-blue circles) and submitted two commits for code review (blue circle) and consider them to be ready for integration. As branches don’t automatically disappear from the repository, the integrated commits are still in the local repository (gray circles), but hopefully forgotten and ready to be garbage collected.

    Local cactus branches
    Figure 2: Local development branches
    Local cactus after integration and rebase
    Figure 3: Local branches after integration and rebase

    Shared remote branches

    As a main principle, shared remote branches should be avoided. All changes should be made available on origin/master and other developers should build their changes on top of that by continuously updating their working copies. This ensures that we do not end up in integration hell that will happen when many feature branches need to be combined at once.

    If you use staged code review system, like Gerrit or github, then you can just git fetch the commit chain and build on top of that. Then git push your changes to your own repository to some specific branch, that you have hopefully rebased on top of the origin/master before pushing.

    Releases are branched out from origin/master

    Releases get their own tags or branches that are branched out from origin/master. In case we need a hotfix, just add that to the release branch and cherry-pick it to the master branch, if applicable. By using some specific tagging and branching naming scheme should enable for automatic releases but this should be completely invisible to developers in their daily work.

    There are only fast-forward merges

    git merge is not used. Changes go to origin/master by using git rebase or git cherry-pick. This avoids cluttering the repository with merge commits that do not really provide any real value and avoids christmas tree look on the repository. Rebasing also makes the history linear, so that git bisect is really easy to use for finding regressions.

    If you are using Gerrit, you can also use cherry-pick submit strategy. This simply enables putting a collection of commits to origin/master at any desired order instead of having to settle for the order decided when commits were first put for a code review.

    Concluding remarks

    Git is really cool as a version control system. You can do all kinds of nifty stuff with it really easily that was hard or impossible to do. Branches are just pointers to certain commits and that way you can create a branch really cheaply from anything. Also you can do all kinds of fancy merges and this makes using and mixing branches very easy. But as with all tools, branches should be used appropriately to make it more easy for developers to their daily development tasks, not harder by default.

    I have also seen these kind of scary development practices to be used in projects with hundreds of developers when moving to git from some other version control systems. Some organizations even take this as far as they fully remove the master branch from the central repository and create all kinds of questions and obstacles by not having a sane default that is expected by anyone who has used git anywhere else.

  • Starting a software development blog

    Isn’t it a nice way to start a software development related blog by talking about how to start a software development related blog? Or at least it goes one up in the meta blogging level where there is an article about what needs taken care of when creating a software development related blog that starts with how meta this is.

    Aren’t there tons of articles regarding this already?

    There are tons of articles about how to establish a blog. Some of them are very good at giving generic advice on topics and blogging platform selection, and you should use your favorite search engine to search for those. I’m just going to explain my reasoning for choices I have made when establishing this blog with my current understanding and background research.

    Why blog at all?

    Why should I blog at all? Blogging takes time and effort and I will probably lose interest in it after some weeks/months/years. The biggest reason for me to start blogging is that I find myself very often explaining things through email discussions, wikis, or chats, sometimes the same things multiple times. Also sometimes I just do benchmarks and other types of digging and comparison, and I would like to have a place to present these findings where I can easily link to.

    What I’m not after here is being a part of some specific blogging community. The second thing that I try to avoid is discussing topics that are not software development related, or just share some links with a very short comment on them. Those things I leave for Facebook and Internet relay chat type of communication platforms.

    Coming up with ideas

    I have collected ideas and drafts that I could write about for almost a year now. And there are over a dozen different smaller or larger article ideas in my backlog. So at least in the beginning I have some material available. The biggest question is that do I find this interesting enough to continue doing and how often do I come up with new ideas that I could blog about.

    Deciding the language I write articles in

    I have decided to write my blog in English. It’s not my first, or even my second language, so why write a blog in English? The reason for this is that software development topics come up naturally for me in English. Also people who may be interested in these topics basically have to know English to have any competence in this industry and I’m not aiming at complete beginners.

    Selecting a blogging platform

    I had a couple of requirements for a blogging platform that it should satisfy so it would fit my needs:

    • Hosted by a third party. I have hosted my own web server and it requires somewhat constant maintenance and I want to avoid that. Especially I don’t want to have constant security issues.
    • To be able to write articles locally on my computer and then easily export the result to the blog. Also as articles generally take quite a few trials and errors, they should be easy to modify.
    • Support source code listings and syntax coloring.
    • Be able to upload files of any type to the platform and link to them. I know that I would like to produce source code, so why not give out larger programs directly as files without having to list all the source code in the article?
      • These files should be easy to modify as, as mentioned before, articles take some iterations to be complete.

    First I was thinking about using Blogger as my blogging platform, as quite a few blogs that I follow use it as their platform. But I wasn’t convinced that the transfer of files from local machine to Blogger service would be the most convenient, especially with images and other types of files. Then I remembered that some people have pages on GitHub and stumbled upon Jekyll that creates static files and is aimed specifically for this kind of content creation. So that’s what I decided to use for now.

    Jekyll does require some work to be set up and probably to maintain, but at least it frees me to do the content in any way I like to. Many of the things that I also need to currently do manually would probably have been taken care of by some plugins on third party blogging services. But that’s the price of doing this in a more customizable way.

    Coming up with some basic information

    Should I add a personal biography to the blog, or should people who have not met me know me just by the name and the content that I write? I have decided to write a little bit about myself that would hopefully shed a light on who I am. But I have decided not to use the author page as my curriculum vitae, as it would get too long and boring.

    Deciding how this blog should look and feel

    As a general rule when making personal web pages, I try to make them load fast and avoid images. And with this blog, I just took the default Jekyll theme and started customizing it in certain aspects where I want to have more information visible. As new devices with different resolutions come into market and browsers get updated, I may need to update this to something that better supports the reactive nature of the website that I hope to have.

    One interesting direction where web development is going into regarding the pages optimized for mobile devices is to have accelerated mobile pages of the content. Currently I have decided to write as much content as possible in Markdown that is processed by kramdown Markdown processor. It doesn’t natively support accelerated mobile pages yet, so I have opted out from using accelerated mobile pages for now. In the future it would be really nice to have lazy image loading and support for responsive images with different sizes and device support.

    Favicon

    Favicon is a small image that should make a site stand out from rest when you have multiple tabs open in a browser, bookmarks, and when creating icons of websites/applications to a device or operating system. I had made a favicon for my web site ages ago. I still have the original high resolution version, so now I decided to see what kind of different formats and sizes there are for these icons. Fortunately I didn’t have to look too much, as I found a website called Real Favicon Generator.net that would take care of converting this icon to different formats for me.

    I was a little blown away by how many different targets there are for these kinds of icons. The generated file collection included 27 different images and 2 metadata files in addition to code to add to a website header. A current list of them looks like this:

    In a couple of years we probably have new systems and resolutions to support. So I need to regenerate these favicons again to support those new systems.

    Responsive images

    Some images that I produce are originally made with Inkscape in vector graphics format. Unfortunately the browser support for vector graphics formats is appalling, the next best option is to use responsive images that provide different resolution versions of the image based on the display device capabilities. Every vector graphic image has 6 versions (100%, 125%, 150%, 200%, 300%, 400%, and PDF) of it generated, which can then be used in srcset image attribute which enables devices with higher density screens to get more crisp images. Higher density images are also used in modern browsers when the page is zoomed. PDF versions are just used in clickable links, as PDF files as images don’t really have that good of a browser support yet.

    Optimize everything!

    When writing content or doing anything with the layout, I naturally want to have them as easy to edit as possible. That also means that if I don’t minify HTML, CSS, images and other often rendered resources on the page, I will lose on page load time. I would like the Jekyll plugins to optimize everything that is produced, but unfortunately only Sass/SCSS assets provide easy minifying options. So page HTML is not minified, even though in the front page’s case it could reduce its size by some percents.

    Every PNG image on this blog is optimized by using ZopfliPNG. It produces around 10% smaller images than OptiPNG, that is the standard open source PNG optimizer. ZopfliPNG takes its time, especially on larger images, but as the optimization only needs to be done just before publishing, it’s worth doing, as it can easily save 10–50% in image size. Also if PNG images are not meant to be as high quality representations as possible of something, they have their color space reduced by pngquant if the color space reduction does not result in a significant quality deterioration. This can result in savings excess of 50% while still keeping the image format appropriate for line drawings and other high frequency image data.

    Every article has its own identicon and that would result in an extra request if I were to add its as a simple image. Instead I embed it to the page source by using base64 encoded data URI scheme for the around 200 bytes that these page identifier images result in. I could shave off some bytes by selecting between image formats that take the least space to encode the same information. I don’t think that I need to go that far, as I usually have far more text on a page than what images usually take.

    Another place that I can optimize is to move all the content that is not required for initial page rendering at the end of the page. This it also possible to load some content asynchronously after the page has initially loaded itself. This mostly means that I have moved all Javascript this page has to the end of the page. Also Google’s PageSpeed Insights gives good hints on what else to optimize and one place to optimize is to either inline or load CSS asynchronously. Unfortunately Jekyll has not made it easy to include the generated CSS file as a part of the document so instead of fighting it, I have opted to have it as a separate file. The second option to load that CSS file asynchronously unfortunately creates an annoying effect when the page is loaded and then the style sheets are applied due to the hamburger button, so I left the main style sheet to be synchronously loaded in the header.

    One place to still optimize would be the style sheets that these pages have, but as this layout should be lightweight enough already, I hope that I can survive without having to optimize the things that affect layout rendering.

    The front page of this blog has 5 articles with full content visible. I have opted for that option simply because I have noticed that I like reading blogs where I don’t need to open any pages to get to the content. Having less articles per page would make the front page load faster and more articles would help in avoiding page switches, so I hope that this is a decent compromise.

    All size optimizations are, in the end, pretty much overwhelmed by just adding Disqus as a commenting platform, as it loads much more data than an average article and also includes dozens of different files that a browser needs to fetch.

    Two-way communication

    One of the powerful features of any blogging platform is that they enable context related two-way communication between the author and the readers. This means that some blogs have comments. As I’m using static pages, the easiest solution is just to leave the commenting options out. But I decided to opt in for Disqus as a commenting system for now, especially because there is a nice post explaining what you need to do with Jekyll to use Disqus on GitHub pages.

    Disqus has its haters and issues, but I think that it’s the best bet for now. If I spend most of my time deleting spammy comments, I will then probably just abandon commenting features altogether and let people comment on Twitter or by using some other means, if they feel that’s necessary and are interested enough.

    Linking to myself

    As I have some presence on multiple social media sites and some personal project resources, I have decided to include the most current ones in the footer of the page.

    Is there actually anyone who reads my blog?

    As I have opted to use a third party web hosting service with this, I don’t have access to their logs and therefore can’t really know how many page loads I get. So I take the second option and use full blown Google Analytics service for this blog. It also provides much better data analysis features than any of the web service log analyzers that I have used, so at least by using it I should have some kind of an idea how many people actually visit this blog.

    Also using FeedBurner should provide me statistics on how many people have subscribed to my blog.

    Article feeds

    For me article feeds have two purposes: to provide the subscribers of this blog an easy access to new articles and for me some kind of an idea how many people are regularly reading this blog. Displaying the full article content in the feed is the approach that I like when reading my feed subscriptions, as I can read the whole article in the feed reader. I decided to use a third party feed service, FeedBurner to get feed subscriber statistics. FeedBurner also provides services I can use to further modify my feed, like making it more compatible with different devices and feed readers.

    One thing with a website with a lot of content is that it should be possible to find the content. One way is to add a search to the website, but it’s not that easy to do if the website is a collection of static pages. I have now decided to opt out of adding a separate search widget and hope that the article list is enough. If there are any specific articles which interest people, web search engines would bring those up.

    Adding share buttons for social media services seems like a no-brainer today. I was already exploring AddToAny service to add such buttons to my site, but then I decided to do a little bit of background research and think about the effects of such buttons.

    A quick search returns articles that suggest avoiding social media share buttons (1, 2, 3) and question their actual value for the website. I checked that AddToAny creates 3 extra requests on a page load. Although such requests are probably cached in the desktop browser, mobile browsers don’t necessarily have such luck with caching and they will then suffer quite a large extra delay while loading those resources.

    Also the share buttons would needlessly clutter my otherwise minimalistic layout, even if I’m not a graphically talented designer to give a professional opinion about this. And also I have had horrible experiences browsing web on a phone where these kinds of social media sharing buttons actually take real estate on the screen and contribute more to already slow mobile experience.

    These things taken into account, I won’t be adding social media sharing buttons here, as their downsides seem to be bigger than the upsides.

    Adding metadata

    Adding metadata to pages should make it more easy to share these articles around, as different services are able to bring out the relevant content on those pages. Major metadata collections for blog posts used all over the web seem to be Facebook’s Open Graph objects, Twitter Cards, and Rich Snippets for Articles that Google uses. To use all of these, you need to have the following information available:

    • Information about the author:
      • Name
      • Web page
      • Twitter handle
    • Information about the organization that produces this content:
      • Name
      • Logo
      • Twitter handle
    • Information about the article:
      • Title
      • Publishing date
      • Modified date
      • Unique per article image, different sizes
      • Article URL
      • Description/excerpt
      • Tags/keywords

    Not all of this information is used in all metadata collections, but you need to have that much data available if you want to use all these three services. Fortunately these services provide metadata validators:

    Unique article images

    The thing that resulted in the most issues was to get an unique per article image done. That might make sense for news articles, but when there are software development related articles about abstract matters, it’s pretty hard to come up with images that do not look out of place or are just completely made up. So I decided to completely make up images that are generated automatically, if there is nothing better specified.

    I’m using identicon type of approach by creating such 4 icons from SHA-1 of article’s path (“/2016/01/software-development-blog” for this one). Then these icons are either combined horizontally or to a 2x2 square grid and that way they are used to create article images that are appropriate for different media. It’s a crude trick, but at least with a very high probability it generates a unique per article image that can be visually identified. For example this article has these identifier images:

    400x400 square identicon for this article

    400x400 square identicon for this article.

    800x200 horizontal identicon for this article

    800x200 horizontal identicon for this article.

    Monetizing blogging

    There are various ways how one can monetize blogging, of which ads are usually the most visible one. As I’m just starting and I don’t have any established readership, it would be a waste of my time to add any advertisements on these pages. If I get to 1000 readers monthly, then I will probably add some very lightweight ads, as that way I could use the money for one extra chocolate bar per month.

    How timeless is this?

    Probably in 5 years, I have lost interest in blogging. Most of the pages that I’m now linking into are gone. Many of the subjects will be outdated either by technology going forwards or development methods going forward. Jekyll will be either in maintenance mode or a completely abandoned project. If I continue blogging, I might even have started using some real blogging service or some other blogging platform.

    Conclusion

    I have not actively been involved in web development during the last couple of years and this provided me an excellent opportunity to see that what kind of web development related advances and changes there have been. The biggest surprise comes from the metadata usage, and how many different systems expect to have icons in different sizes and formats.

    I generally try to put usability and speed before looking fancy. In this blog, however the need for speed is not always satisfied, as having the commenting system hosted outside adds a lot of extra data to transfer and makes these pages more heavy for low power devices.

Page 2/2 • Newer articles

subscribe via RSS