Search: .lenght - Github

theli0nheart · on Jan 14, 2012

I remember seeing a Github bot a couple weeks ago that strips out whitespace and adds a .gitignore file to a repo (I also remember this really rubbing some people the wrong way). This search indicates that it would probably be useful to have a linter bot running on Github for all the popular languages. It would find syntax errors, common mispellings, and compilation issues, and then submit pull requests to fix the issues.

I have no time to work on something like this myself, but I'm sure a lot of people would find it useful, especially if it acted as a "first defense" before deployment. Curious what other HN'ers think about this.

Mizza · on Jan 14, 2012

I wrote that bot!

https://github.com/Miserlou/WhitespaceBot

Feel free to fork it to do whatever you want, that's why I made it.

Stratoscope · on Jan 14, 2012

Hey, some of us actually use trailing whitespace! :-)

I use it to create useful indentation guides in Komodo. If the whitespace is stripped away, the indentation guides have gaps where there's a blank line in an indented block.

Maybe Komodo could use a different method to decide where to draw the lines that didn't depend on trailing whitespace. It looks like Sublime Text 2 has a different approach for its indentation guides - maybe the Komodo guys should look at that. But in the meantime I'm using Komodo as it actually works today, so the whitespace on blank lines is important. Let me keep it, please? :-)

I wouldn't mind stripping out trailing whitespace on nonblank lines - that wouldn't affect my precious indentation guides.

But wait a minute, what about Markdown? Two spaces at the end of a line to get a <br>, right? Does the bot skip Markdown files?

Finally, for the folks who have automatic whitespace removal in their editor settings... Please be careful: With this setting, you'll be very likely to make a commit that includes both significant code changes and a mass of whitespace changes in the came commit.

Those kinds of changes should be separated: one commit for the code itself, and a separate commit for the whitespace with a comment like "Whitespace cleanup, no code changes."

This allows people who diff the revision history to diff with whitespace significant most of the time, the only exception being when reviewing a whitespace-only change.

(Edited for friendlier tone...)

rmccue · on Jan 15, 2012

From a quick look at the source, WhitespaceBot already excludes Markdown files:

    banned = ['.git', '.py', '.yaml', '.patch', '.hs', '.occ', '.md', '.markdown', '.mdown']

mook · on Jan 17, 2012

If you create a JS macro that is triggered on file open, with the contents "komodo.view.scimoz.indentationGuides = komodo.view.scimoz.SC_IV_LOOKBOTH", it should give you the indentation guides without the whitespace actually there. http://www.scintilla.org/ScintillaDoc.html#SCI_SETINDENTATIO... has some explanation of the possible values.

hartez · on Jan 14, 2012

Is there any equivalent of the robots.txt standard for public code repositories? Being able to opt-in to certain bots might be helpful (opt-out being the default, of course).

gerggerg · on Jan 15, 2012

That's a really cool suggestion and I would love to see some sudo-standard on this.

xyzzyb · on Jan 14, 2012

Ah, aggressive trailing whitespace removal. That I can completely get behind. I've already got command-s bound to a custom macro that strips trailing whitespace in TextMate for myself and my co-workers; but this would be an even more inclusive solution.

tensafefrogs · on Jan 14, 2012

Fantastic. If you use vim, you should have this in your .vimrc:

" Remove any trailing whitespace that is in the file

autocmd BufRead,BufWrite * if ! &bin | silent! %s/\s\+$//ge | endif

illicium · on Jan 14, 2012

I prefer using the vim-trailing-whitespace plugin and fixing it manually: https://github.com/bronson/vim-trailing-whitespace

throwaway392 · on Jan 15, 2012

Or you can use my competing plugin: https://github.com/bitc/vim-bad-whitespace

which has some advantages (described in the README)

jdwhit2 · on Jan 15, 2012

One advantage taken from the readme:

  This plugin is better than using the builtin vim 'list' command because it
  doesn't show an annoying highlight while you are typing in insert mode at the
  end of a line.

snprbob86 · on Jan 15, 2012

Or, if you prefer a more manual approach (ie. my fellow paranoid developers) simply use `list` and `listchars`:

    set list
    set listchars=trail:•

In other people's code, use `set nolist` to prevent hyperventilation.

In general, explore `help list`.

phillmv · on Jan 14, 2012

Uhm. Why is this annoying, other than the fact that it shows up in your git commits?

Mizza · on Jan 14, 2012

If you use Vim from the terminal to edit text (which I do), trailing whitespace shows up as big white blocks. It's slightly visually distracting, but it's really just an OCD thing.

gbog · on Jan 15, 2012

No.

Absence or presence of trailing whitespaces is sometime significant, such as in big ereg in re.VERBOSE mode in Python.

Therefore, it must be visible, and if it is visible it must be removed when it has no use.

yxhuvud · on Jan 14, 2012

I prefer to have my editor strip that whenever I save a file without any manual action.

xyzzyb · on Jan 14, 2012

I should clarify that command-s is the save command. My macro overrides the standard behavior.

xxqs · on Jan 14, 2012

in my 10-years old project, removing all trailing whitespace would produce a 1000+ lines commit, and make many contributors' lives harder.

so, thanks, but I'll keep my whitespace.

tux1968 · on Jan 14, 2012

Perhaps it would be more productive to advocate the use of local pre-commit hooks. Git makes it very easy to configure validation locally long before anything gets sent to Github.

Would be nice if Github provided better documentation and a selection of validation templates to include in new projects. This would better leverage the power of Git and its distributed nature than a bot running on Github.

campnic · on Jan 14, 2012

I'll look for more on this, but if you had something you would recommend as a tutorial, I'd appreciate it.

tux1968 · on Jan 14, 2012

Didn't have a specific example in mind; there is definitely opportunity for Github to help with education and adoption.

For some reference material check out:

  http://book.git-scm.com/5_git_hooks.html
  http://progit.org/book/ch7-3.html

And a couple simple examples:

  http://mark-story.com/posts/view/using-git-commit-hooks-to-prevent-stupid-mistakes
  https://github.com/ReekenX/git-php-syntax-checker

andrewcamel · on Jan 14, 2012

I actually just replied with a suggestion that Github should implement some built-in functionality for simple error-checking. I think it's a great idea. It would be really helpful if you got a simple little list of notifications for a commit indicating possible error points.

gravitronic · on Jan 14, 2012

avoid feature creep! github's platform allows for bots to run, a bot is a perfect way to implement this instead of it being "built-in".

Noughmad · on Jan 14, 2012

So basically something like Krazy does with KDE: http://ebn.kde.org/krazy/

zeratul · on Jan 14, 2012

I thought this is actually interesting but I would like to know if >4k misspellings is a lot or not. Here is one way to do it:

    LANGUAGE #LENGHT #LENGTH = #LENGHT/#LENGTH  
    JavaScript 4252 2907459 = 0.0015
    C 18981 2902857 = 0.0065
    Java 7706 2348900 = 0.0033
    Ruby 10789 1690604 = 0.0064
    C++ 9458 1315552 = 0.0072
    PHP 3116 1167924 = 0.0027
    C# 1352 937647 = 0.0014
    Python 3662 737292 = 0.0050
    Ruby 1232 380484 = 0.0032
    Perl 1239 258892 = 0.0048
    Objective-C 679 238051 = 0.0029

P.S. There is something wrong with Github's language breakdown algorithm, sometimes it shows same language twice with a different number of hits.

jond3k · on Jan 15, 2012

I noticed another problem is that the highlighter will select the language name as well as the term which means <?php and (c) Copyright are shown instead of the actual mistake.

I put together a GitHub Illiteracy Index script https://github.com/jond3k/sandbox/tree/master/github-illiter... which you can play around with if you like :D

koenigdavidmj · on Jan 14, 2012

C# I can understand being so low, since it's almost always written in Visual Studio or MonoDevelop (both of which provide autocompletion). But how is JavaScript the next lowest?

JesseAldridge · on Jan 14, 2012

Hmm, I guess it's because length is a commonly used function in Javascript, but not in other languages. In Python you do len(list) instead, so the word length is more likely to appear in comments and therefore less likely to be corrected.

sjwright · on Jan 14, 2012

Because the built-in .length property is frequently used, and will fail if misspelled.

Whereas C and C++ programs tend to have a lenght operator implemented by the programmer, and from there the error gets snowballed by IDEs and debuggers.

untog · on Jan 14, 2012

C# is also a compiled language, so you wouldn't be able to get anything to run with a typo like that hanging around. I find it surprising that there are so many commits making that mistake!

tshaddox · on Jan 15, 2012

Being a compiled language isn't sufficient to prevent "no method" errors. It's completely possible for a compiled language to define methods at runtime or use duck typing.

lancefisher · on Jan 14, 2012

It shows up in comments and variable names frequently.

cleaver · on Jan 15, 2012

Odd that compiled languages (C, C++, Java) are higher than some interpreted languages (PHP, Javascript). Of course, the search will match comments as well as code, so it may just mean they have better comments.

Also fun to search on "functino".

RandallBrown · on Jan 15, 2012

I can't think of anything in C or C++ that uses length off the top of my head. Size and Len, sure, but nothing that's length.

I would guess that most of the spelling errors get propagated through autocomplete. That's how most of the spelling errors in my code get there anyway.

rorrr · on Jan 15, 2012

    LANGUAGE   #LENGHT    #LENGTH    = #LENGHT/#LENGTH  
    C#            1352     937647    = 0.0014   <- best
    JavaScript    4252    2907459    = 0.0015
    PHP           3116    1167924    = 0.0027
    Objective-C    679     238051    = 0.0029
    Ruby          1232     380484    = 0.0032
    Java          7706    2348900    = 0.0033
    Perl          1239     258892    = 0.0048
    Python        3662     737292    = 0.0050
    Ruby         10789    1690604    = 0.0064
    C            18981    2902857    = 0.0065
    C++           9458    1315552    = 0.0072   <- worst

josegonzalez · on Jan 14, 2012

For the record, Github's search index is wayyy out of date sometimes. The second user here is me and I deleted that user like two years ago: http://cl.ly/0y271f0T3G0X2J1L022E

eik3_de · on Jan 14, 2012

Same here, I contacted support two times in two years and they said "we're working on it". Obviously, that isn't true and they just don't care about the outdated search index.

holman · on Jan 15, 2012

Your latter statement is unequivocally false, for what it's worth. On both counts.

jc123 · on Jan 15, 2012

Perhaps some info about the progress being made would be more helpful as elk3 has mentioned contacting support over the past 2 years.

xcud · on Jan 14, 2012

'wtf' is a good search term when coming into contact with a new codebase; https://github.com/search?type=Code&language=JavaScript&...

jond3k · on Jan 15, 2012

  #  Language    Illiteracy
  1  C           0.02877583  
  2  Perl        0.01635618  
  3  Ruby        0.01560477  
  4  JavaScript  0.01330989  
  5  Shell       0.01235425  
  6  Python      0.01046104  
  7  PHP         0.00910218  
  8  Java        0.00736395

(For height, length and hierarchy, averaged out)

And you thought this would end up being a PHP joke...

https://github.com/jond3k/sandbox/tree/master/github-illiter...

timdorr · on Jan 14, 2012

"hieght" is also a good one: https://github.com/search?type=Code&language=JavaScript&...

alpb · on Jan 14, 2012

not an attribute of a standard object type, though.

sjwright · on Jan 14, 2012

Neither is length in many languages.

mark_story · on Jan 14, 2012

But length is in Javascript. Both String and Array have that property.

sjwright · on Jan 14, 2012

Which is why I said many and not all.

angrycoder · on Jan 14, 2012

Even the search has a bug. The query is for ".lenght" but many of the highlighted results are just lenght without the dot.

cpr · on Jan 14, 2012

Prob a reg exp so matches any char...

andrewflnr · on Jan 14, 2012

But then shouldn't the highlighted bit include the char in front? It's also case insensitive. I think it's trying to be clever.

mattdeboard · on Jan 15, 2012

More likely is form/input validation.

southern · on Jan 14, 2012

A common typo, it seems. But I'm a bit confused as to why this was submitted.

amirhhz · on Jan 14, 2012

In JavaScript, if you check for a non-existent property on a variable (e.g. aVar.lenght vs aVar.length) it will return "undefined". So people often rely on this behaviour to check if something is an array or not (no comment on whether this is good or bad), with:

    if(somethingThatMightBeAnArray.length){
        // do things with array
    }

So misspelling of length can be making a lot of code out there behave in an unexpected way.

jpeterson · on Jan 14, 2012

The same pattern is widely used to test whether an array-like object is empty. Since a length of 0 is also "falsey", it evaluates as false when the array has no elements. A typo in this case would result in the tested array always being "empty".

thousande · on Jan 14, 2012

That is a bad thing as String also have a length property

"bar".length; // 3

kaffeinecoma · on Jan 14, 2012

In a static language this would be flagged as an error. I assume something less than ideal happens in languages such as Ruby.

I once worked at a company where a very early piece of code had a typo "properites" instead of "properties". This misspelling became institutionalized, and was used throughout the codebase because it was deemed too expensive to fix. And this was with a static language (with good IDE refactoring support)!

Wilya · on Jan 14, 2012

In ruby, and I think most dynamic languages, this type of typo is likely to raise an exception. It could hurt, but a simple test run is likely to discover it.

The way javascript (which is what is linked) handles this, as amirhhz described it, leads to silent errors which could turn out a lot worse.

kaffeinecoma · on Jan 14, 2012

Yes, but it would raise the exception at run-time, and only when the particular path is taken.

xxbondsxx · on Jan 15, 2012

There's actually no exception, it just returns "undefined" and the if statement fails. That's why it's such a deadly bug -- no exception, and path dependent. Combine that with the async nature of JS and it's going to be a long night tracking that one down

lopopolo · on Jan 14, 2012

There are ways to raise this sort of error even in static languages: objc_sendMessage comes to mind

palish · on Jan 14, 2012

I'm confused as to why they couldn't simply:

  grep -R properites .

kaffeinecoma · on Jan 14, 2012

The whole software infrastructure was a scary house of cards. They were afraid that there was unknown code that might be depending on it. For example RESTful services in other departments that were not under our immediate control.

storborg · on Jan 14, 2012

Perhaps the same reason why "referer" has not been corrected to "referrer".

kaffeinecoma · on Jan 14, 2012

Yes, it was exactly like that- lots of code had grown around the "bug", and it was not immediately obvious what other software had come to depend upon it. "Little hairs", as Joel might say.

evanlong · on Jan 14, 2012

Obviously you have never worked on a code base with 1000s of developers. If you edit almost every file then basically everyone needs to stop writing new code while the change is made. Otherwise the merges others have to do is going to be a disaster.

adambyrtek · on Jan 14, 2012

Honestly, do you really work on the same code base with "1000s of developers"? I find it really dubious.

evanlong · on Jan 14, 2012

Yes. It's called Microsoft.

adambyrtek · on Jan 16, 2012

Well, then problems with such process are pretty well documented[1]. For comparison, at Google global refactorings are pretty common and usually painless, there are even custom tools to support such changes (push them through code review, ensure no tests are broken etc.)

[1] http://moishelettvin.blogspot.com/2006/11/windows-shutdown-c...

evanlong · on Jan 17, 2012

I know the Microsoft process all too painfully. RI,FI,RI,FI,RI,FI,RI,RC,RTM,Ship It,Repeat.

But as to the "usually painless" at Google. So when does that pain happen?

Can you take me through the following scenarios: change a variable name, change a base class name that lots of people extend from, file renames?

How do you go about refactoring? Do everything at once? Breaking it into pieces? Do file rename then variable and base class renames? Or smallest piece at a time?

Once the refactoring is complete how do you communicate to others the changes so when they merge the code in they don't get too messed up? Or worse undo something in the refactoring. (Also follow up is it better to do the big refactoring so there is the one big merge or a bunch of little refactorings and lots of little merges across the spectrum).

I guess the code change isn't the problem. It's making a big change and getting people on the same page is much harder. Especially when their are varying degrees of skill and experience on a project. And it's this stuff that is painful and leads to not wanting to do big refactorings at a lot of shops.

adambyrtek · on Jan 18, 2012

Hacker News has a short attention span and this probably won't be seen by many, but I'll try to answer your question nevertheless. There are several factors I'd like to mention.

1. Most importantly, the version control head is always the point of reference, and the burden of merging is on people who keep long-lived pending changes. This means that conflicts are resolved as soon as possible by a person who actually knows the context, instead of being postponed until a dreaded merge window. Ultimately, a programmer pursuing refactoring is only responsible for making sure it works on the head, and should announce the change so that others are prepared for merging.

2. There are some huge code bases at Google, but nowhere near the size of Windows. On the other hand, I'm sure that even Windows has to be separated into more or less decoupled components. When I doubted that you work on the same code with thousands of other programmers I was thinking in terms of components, not final products.

3. Cultural aspect shouldn't be disregarded. Code hygiene is encouraged at Google, and some people volunteering their 20% time to help with that. Moreover, there are some custom tools that make global refactorings much easier and safer.

Hope that was helpful.

evanlong · on Jan 19, 2012

I am not the average HNer... Thanks.

chernevik · on Jan 14, 2012

Perhaps they were using . . . something other than *nix.

EDIT: Justly downvoted <strikeout>chastised</strikeout> for attempting humor without understanding.

palish · on Jan 14, 2012

That command works nicely for me in Cygwin.

EDIT: Eh, apologies if this sounded like chastising -- I didn't mean to. As a developer who's been trapped in "Windows-mindset" for many years, I wanted to try to inspire other Windows devs to try to use *nix-based solutions even if their only option is Windows development. Cygwin is in a very good spot right now -- it's achieved so much acceptance that even the most hardened institutions now allow it to be installed.

chernevik · on Jan 14, 2012

No snarkier than my remark, and less snarky than it might have been for being so much smarter.

strictfp · on Jan 14, 2012

I had this problem as a junior dev when my english was weaker. The problem stems from that 'height' is spelled with 'ht', but width with 'th'. Since one often write those words in conjunction, it is easy to mix the endings up. If you're then a non-native speaker and don't run spellcheck on your code, you might end up writing 'lenght' and 'heigth' quite a few times, I know I did :)

billpg · on Jan 14, 2012

My experience is more with languages that are typically compiled and would report this error as an error fairly early on, so the coder would correct it long before checking the code in.

What's the trade-off by having "undefined" returned instead of having an error reported as soon as the code is loaded?

nostrademons · on Jan 14, 2012

It prevents you from later defining a 'lenght' method and using it at runtime without a recompile.

For core methods like 'length', it seems silly to think that you'd want to redefine it. And indeed, it's usually counterproductive - that's why any experienced JavaScript dev will have coding conventions like "Don't muck with the prototypes of built-in objects."

But at the application layer, this can be really useful. Imagine you're adding a new field to a message deep in the storage system, and then you want to pass that along to a template in the rendered HTML. It's really useful to be able to do this without recompiling & restarting each individual server between the backend and the frontend, and just edit a few template files and have them automatically pick up any changes to backend data formats.

Ditto adding a new database column, if you're using an RDBMS - it's pretty handy to have your model objects instantly reflect the new field, instead of needing to manually add accessors to each of your model classes. Rails and Django are built on this principle.

Also, you have a versioning problem with statically-compiled code in a distributed system. Imagine that you add this new 'lenght' field to a backend message, and add it to the frontend, and they both compile & deploy. Now imagine that a message from an old backend hits a new frontend (it's not possible to upgrade a whole distributed system at once without downtime). What does the new frontend do with it? It needs a piece of data, but the backend had no idea that it had to provide that piece of data. The only thing it can do is return the equivalent of 'undefined'.

In C++/Java code, you usually deal with these by inventing frameworks. Google code, for example, is littered with

  if (msg.has_new_field()) {
    run_long_complicated_ui_display_routine(msg.new_field());
  } else {
    fall_back_to_old_behavior(msg.old_field());
  }

checks. If you use a more dynamic language like Python, you can use language mechanisms to represent undefined values or fields that are defined at runtime. If you use a static language, you're stuck mimicking them with hashmaps and null.

aardvark179 · on Jan 15, 2012

Whether your language is compiled is not the issue, it's how you model objects and calling methods on them. In smalltalk and other languages that take a message passing approach doing a.b() sends a message "b" to object a, and the object can do anything it likes with that.

Now the normal (and optimized) route is to find the method on a’s method table and then call that, but if a doesn't have that method then a second method may be called to allow this to be handled. Once you have that sort of mechanism you can make ORM libraries that dynamically examine a schematic at run time and generate accessor methods only as they are needed, decorators, proxies and many other patterns become wonderfully simple, and there are often many more opportunities for meta-programming at run time.

The downside is of course that it becomes harder to find errors when writing or compiling, but tight integration of your development environment with your runtime can help with this.

joblessjunkie · on Jan 14, 2012

It should be possible to build a bot that automatically generates patches and pull requests for these kinds of typos.

gren · on Jan 14, 2012

What about this one: https://github.com/search?type=Code&language=JavaScript&...

veyron · on Jan 14, 2012

Someone wrote a spellchecker a while ago using perl spellchecker: http://blog.holdenkarau.com/2011/08/automatic-spelling-corre...

j_baker · on Jan 14, 2012

Equally scary to me is "UFT8".

https://github.com/search?langOverride=&language=&q=...

eik3_de · on Jan 14, 2012

105395 results for heigth, now beat that ;)

flexd · on Jan 14, 2012

And this is why we have testing frameworks.

azth · on Jan 14, 2012

... and compiled languages. Testing won't ensure 100% code coverage.

flexd · on Jan 15, 2012

... and nice things like https://github.com/scrooloose/syntastic for vim (or your editor of choice).

wahnfrieden · on Jan 14, 2012

And static analysis.

davidmccann · on Jan 15, 2012

Recieve. Has to be my number one pet pieve.

https://github.com/search?type=Code&language=JavaScript&...

mrchess · on Jan 14, 2012

This reminds me of a US company I worked with that outsourced some of their service layer work to a company with heavy European influence. As a result, API methods also had the spelling of certain words eg. getColour() or getFavourites(). Good times.

gus_massa · on Jan 15, 2012

In the LaTex editor that I'm using (WinEdt), I have a custom color highlighting that marks \rigth and \heigth in red+bold+strikeout, so I don't have to wait to compile and see a strange error to spot the mistake.

andrewcamel · on Jan 14, 2012

It'd be great if Github would scan your code for errors like these and just let you know they exist (in case you didn't want them to, which I would assume you wouldn't for the most part).

justinhj · on Jan 14, 2012

For some reason in video game source code I see the word 'hierarchy' in comments spelt wrong a lot in every project I've been on.

kissickas · on Jan 14, 2012

Is there any context to this or are you just pointing the humor of out how common this is?

cachehit · on Jan 14, 2012

Well it is basically a list of bugs -- and a rather long one too.

Of course there are rare cases where "lenght" is a variable and that name is used in every instance but mostly, these are bugs in code that we all use.

obilgic · on Jan 14, 2012

most of them are variable names which is acceptable!

VMG · on Jan 14, 2012

Not at all. It's irritating and confusing.

Come back to that code in a year and try to extend it, stuff will break because you start to use the correct name.

on Jan 14, 2012

[deleted]

Navarr · on Jan 14, 2012

The first results are all comments and spellcheckers.

udp · on Jan 14, 2012

It's not, because reutrn is a syntax error, but .lenght is valid and would return `undefined`.

jaspervdj · on Jan 14, 2012

Yes it is, it would work fine as a variable. E.g.

  var lenght = 23;
  console.log(lenght);

will not cause any troubles. And many of the results returned by the search are of this kind.

udp · on Jan 14, 2012

Were you replying to me? I explicitly said .lenght would return undefined. As in, a typo on the .length property.

jaspervdj · on Jan 14, 2012

Sorry -- I figure I misunderstood you.

speleding · on Jan 14, 2012

I just tried "heigth", it's almost as bad.