Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This has literally never come up for me in 5 years of using git

I get the impression that, unlike the article's intended audience, you know git well enough to not need to follow its advice. When giving advice, you have to aim at the expected level of understanding. That can easily lead to examples that frustrate people with a higher level of understanding, because they seem to over-simplify.

In this case, you're taking me too literally - I was trying to give quick & simple examples for a complex issue. It's just an illustration of how "simple linear history" is a non-sequitur - you now have to factor in that a specific commit could very easily have been made after a commit that comes after it in the history. You're moving complexity around, not reducing it.

> Rebasing doesn't destroy history, it rewrites it.

Rewriting is destroying.

Worse, it's lying - it's saying "These commits were made in this order" when they weren't.

> Why does a rebase-based workflow lead this to be a catastrophe?

It doesn't. You've ignored the context - which was of someone who wants his commits to be de-bugged and habitually uses rebase to do so.

> you shouldn't use rebase if you don't know what you're doing with git

That was kind of my point - people are using rebase instead of learning to use git. And should therefore be advised to stay away from it until they have got the fundamentals down.

> All of this is just a distraction to the core questions:

Sorry, those might be your core questions but they weren't mine: I was writing specifically to counter the suggestion that rebase should be heavily-used by people who don't really understand its ramifications.



Rebase is not a lie, it is the developer explicitly putting commits in this order.

Best practice certainly depends on your environment. In small projects, let's say two/three devs working with a couple dozen tightly-coupled files (like HTML/CSS/JS), you have to keep up with changes frequently; that means pulling all the time, otherwise you'll always have monster conflicts to solve. That 'tiny window of opportunity' when someone else has pushed and you have changes is all the time. In a codebase where you can happily work on an isolated feature for a week, things are different.


And you'll ultimately have to decide whether you want your codebase to represent a series of physical commits, bits that people happened to type out... or whether it should represent a series of logical changes to the codebase.

But git and git-rebase were explicitly designed to make the "series of logical changes" easier, because it was designed for Linux kernel development, and they like series of logical, meaningful changes over there because it assists in understanding the changes which are being integrated.

Now, I'm sure there's some business case where the series-of-physical-changes sequence is more important to someone. I don't personally agree that's the best way to program in general, but not everyone's compelled to agree with me. And in that case, don't use rebase.


> And you'll ultimately have to decide whether you want your codebase to represent a series of physical commits, bits that people happened to type out... or whether it should represent a series of logical changes to the codebase

We're on the same page here, but there's a bit of a deeper point being hinted at, which is that being able to be "physical-series-of-commits" is useful when you're in experimenting-dev/refactor mode, and isn't useful at all for people trying to understand your history. Well, mine's not at least -- it's a lot of shitty diffs and "fuck shit checkpoint" commits that don't live long.

Logical history is basically always useful, but you often want temporary checkpoints for yourself before your changes are going to coalesce into something that represents a logical patch-set.

Rebase, in it's most commonly useful ... use case, lets you tack new incoming history that other people have shared onto history you haven't shared yet. For the people other than you, your changes exist in the future.

History rewriting in general lets you fuck around locally with stuff you haven't shared, making trashy checkpoint commits or whatever you want to do, and clean that all up to represent a logical series of changes for sharing.

In short:

Don't work on tracking branches. Branches are cheap.

Periodically bring in others changes into your work with rebase. Your work isn't public and theirs is, theirs is part of public history, yours is part of future history. Always rebase before bringing your changes back into the tracking branch, and then merge your changes into the tracking branch. Often, the history cleanup I want to do is handled by merge --squash, bundling up the change I'm pushing as one commit.

You, whoever is looking at my incoming change, care about the fix for bug FOO or feature BAR being completed. You don't care that I made a bunch of stupid typos. If you do, I don't want to work for you unless you care because your doing some sort of awesome stat analysis on typos made by people like me.


Exactly. An author of a book may have written chapter 23 before chapter 4, but the reader doesn't care. They want to see a logical progression.


> Rebase is not a lie, it is the developer explicitly putting commits in this order.

See my comment on the post - rebasing absolutely is a lie. Sometimes it's a good idea to lie; sometimes the lie is only the tiniest of fibs. But it's always a lie.


I guess the word "lie" is too negative word to be used here. It's as much or less a lie as not including all the different revisions of the code written before the final version.

As you said, sometimes it's a good idea to refactor the history, or not comment on algorithms that was not chosen for the problem at hand. But there are times when that information might have some value.


Rebasing isn't saying "these commits were made in this order" when they weren't, it's explicitly choosing to make those commits in that order.


> It's just an illustration of how "simple linear history" is a non-sequitur - you now have to factor in that a specific commit could very easily have been made after a commit that comes after it in the history. You're moving complexity around, not reducing it.

You are removing several sources of complexity when you rebase.

1) You have fewer logical branches to keep track of. Before you rebase, it's unclear what code in the main branch your feature branch depends on -- clearly it depends on all the things in the common ancestor, but if there were any merge conflicts, it depends on the resolution of all of those as well. Rebasing lets you easily see: "This commit depends on the commit immediately before it, dependencies on older code have already been resolved, no need to look elsewhere."

2) You remove a noisy merge commit. Merge commits, unless they are actually resolving a conflict between two branches of code that are both in use somewhere, serve no functional purpose. They only record that a particular physical operation was performed. If the merge is resolving some conflict that only one developer ever saw and only on his local machine, then why not rebase that content-less piece of information from the historical record?

3) This is the softest point, but rebasing encourages manual curation. On average, I would expect that this increases the quality of commit messages, and encourages the removal of commits that are just useless noise. The developer wrote one commit message in the middle of work, or maybe just because it was 5pm, and it reflected his thinking at the time. Later, after he knows the context the commit falls in, he looks at and evaluates it. I think of it as revising a draft -- sometimes you got it right the first time, sometimes you didn't. In any case the end product is unlikely to be worse off unless someone is throwing away EVERYTHING by squashing down to one commit and giving it a message "squashed".

In any case, the end result really isn't any more complicated. Take your example: you want to know what caused the bug at Monday at 8pm on the production server. You know that a particular commit was deployed, say "deadbeef". Is there really a meaningful difference between commits with timestamps before 8pm Monday that come after "deadbeef" on the master branch, and ones with timestamps before 8pm Monday that weren't yet merged?

If your problem is really, "I don't know what commit was on the production server at 8pm Monday, and I need a branch with linear timestamps in order to figure that out" then you have a totally different problem.


> I get the impression that, unlike the article's intended audience, you know git well enough to not need to follow its advice. When giving advice, you have to aim at the expected level of understanding. That can easily lead to examples that frustrate people with a higher level of understanding, because they seem to over-simplify.

Fair enough. I think I would have been more understanding without the unilateral declaration that rebasing more than once a week is a "process smell" as it were.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: