Sorry to keep laboring the point :-) but the other reason I'm pretty sure this is a client bug is that the client doesn't truncate the returned file at the end of the short read, which you'd expect if it actually was treating short read as EOF.
If you copy a 100mb file and the server returns a short read somewhere in the middle of the read stream the file size on the client is still reported as 100mb, which means file corruption as the data in the client copy isn't the same as what was on the server.
That's how this ended up getting reported to us in the first place.
Yes, that's a good point. I agree that there appears to be a client bug here. From a quick glance, it appears that nothing is checking that the non-final blocks in a pipelined read are returned from the server in full.
I don't necessarily agree that retry is the right behavior though. Wouldn't that result in an extra round trip in the actual EOF case? Again, not having thought about this much, it seems a more efficient interpretation of the spec is that truncated reads indicate EOF. In that case, a truncated read as in the middle of a pipelined operation either indicate the file's EOF is moving concurrently with the operation (in which case stopping at the initial truncation would be valid) or the lease has been violated.
Regardless, I work on SMB-related things only peripherally, so I do not represent the SMB team's point of view on this. Please do follow up with them.
It's only an extra round trip in the case of an unexpected EOF. File size is returned from SMB2_CREATE and so given the default of a RHW lease then (a) the lease can't be violated - if it is, then all bets are off as the server let someone modify your leased file outside the terms of the lease. Or (b) you know the file size, so a short read if you overlap the actual EOF is expected and you can plan for it.
A short read in the middle of what you expect to be a continuous stream of bytes should be treated as some sort of server IO exception (which it is) and so an extra round trip to fetch the missing bytes returning 0, meaning EOF and something truncated or an error such as EIO meaning you got a hardware error isn't so onerous.
After all this is a very exceptional case. Both Steve's Linux cifsfs client and libsmbclient have been coded up around these semantics (re-fetching missing bytes to detect unexpected EOF or server error) and I'd argue this is correct client behaviour.
As I said, given the number of clients out there that have this bug we're going to have to fix it server-side anyway, but I'm surprised that this expected behavior wasn't specified and tested as part of a regression suite. It certainly is getting added to smbtorture.
Whenever a client gets a short read it needs to issue a request at the missing offset if the caller wanted more bytes. Only if the server returns zero on that read can it assume EOF and concurrent truncation.
We're going to have to fix the Samba server to never return short reads when using io_uring because the clients with this bug are already out there. But if what you're saying is how Microsoft expects the protocol to operate then it needs to be documented in MS-SMB2 because I don't think it's specified this way at the moment.
Well, the man page does say that "The readv() system call works just like read(2) except that multiple buffers are filled".
If we go to read(2) we find "It is not an error if [the return value] is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now [...], or because read() was interrupted by a signal."
As an outsider, I'd never rely on this returning the requested number of bytes. If I required N bytes, I'd write use a read loop.
But I do agree that the RWF_NOWAIT flag mentioned in your other comment doesn't help, as it suggests the default is to block.
Reminds me of using a slide rule. You normally push the inner part (the C scale) to the right, line up the 1 on the C scale with the first number you're multiplying on the D scale, then look on the C scale for the second number you're multiplying, and read the result off the D scale immediately below that.
But when the result is more than 10, you've wrapped: your answer is off the D scale. So now you have to push the inner part back to the left, and line up the 10 (usually marked as 1, at the right-hand end) on the C scale with the first number on the D scale. And remember to add 1 to the exponent.
I've seen slide rules where the D scale goes slightly beyond 10 (like 10.1), so if the result was just a tiny bit over 10, you wouldn't need to wrap.
Please don't call Samba.tv just 'Samba'. We (the Samba project) have trademark on Samba, which is for the file server, authentication and print server that runs on Unix boxes and allows interoperability with Microsoft Windows.
Samba.tv is... Something else. It has nothing to do with us.
Unfortunately, I don't think it's very useful to call out the consumer on this -- it might make sense to go after Samba.tv in court to prevent them from using the trademark.
(BTW, I do really appreciate all the work done on samba over the past few decades, so thanks!).
>"Infringement may occur when one party, the 'infringer', uses a trademark which is identical or confusingly similar to a trademark owned by another party, in relation to products or services which are identical or similar to the products or services which the registration covers"
this basically means that in order to have an infringement case, you need to show that both the name/brand/logo are confusingly similar, and also that the product/service/domain are close enough to each other. I don't know that "software" is a sufficiently narrow category to count.
That's exactly the scenario I was thinking of. I've run samba in a household with a (non-smart) TV, back in the day. There's definitely some overlap there.
When I first read this, I assumed Samba had started a file streaming service/app for Android TV to stream movies from Samba shares.
Admittedly, a few seconds of thinking made me realise they'd never do that... but it's easy to see why people are getting confused, which is EXACTLY where Trademark stuff comes in.
I was sorely confused as I thought Samba was legitimately doing this at first. It's literally Samba's name with .tv appended, and one can play files using sambda + VLC..
No religious objection. The problem for me with GPLv3 is that it is not compatible with (privately) signed code. If it is possible to run unsigned code on my appliance then my proprietary code would not be secure, putting the entire business in jeopardy. If you can square this circle then I'd love to use it.
I'd be interested to see a list of shipping appliances (meaning not open hardware platforms) with GPLv3 if you know of any.
>No religious objection. The problem for me with GPLv3 is that it is not compatible with (privately) signed code. If it is possible to run unsigned code on my appliance then my proprietary code would not be secure, putting the entire business in jeopardy. If you can square this circle then I'd love to use it.
Your code is still under the full protection of the law. And no signing mechanism will prevent a competitor from simply dumping the flash and reading your code off there, if they really want to - if anything this is probably easier than running their own code on the system. So I don't see what using the GPLv3 changes.
If you're really paranoid, how about running samba in a chroot/jail/etc. where it has access to the data files it needs to serve/store, but not your code? (Your code can operate on the same data from outside the chroot). As long as you make it possible for the user to upgrade samba (which should be fine - you don't care what code runs inside this chroot, because it only has access to the same files the user could access via samba anyway, so the samba that runs in the chroot doesn't have to be signed) you're compliant with the GPL but haven't exposed the rest of your system.
If it is a straight rip-off then the law should protect (at least in the west), but if it were just used (for learning or adapting from) then it could be exceedingly hard to prove or even know about. I suppose what I would be most worried about is if it were leaked such that anyone could use it on any platform without paying. Who would buy an apple TV if you could run it off your raspberry pi? (I know the analogy doesn't quite work-- aTV is decent value as hardware-- but as a start up I will have higher costs so higher prices).
>flash dump
This is why you encrypt the private data on your flash :) Decryption codes can be stored in the processor (it's been a while since I looked at the system- I'll have to look again, but it seemed solid). So that means they'd have to either de-solder the RAM while somehow keeping it freezing cold too, or use an electron microscope or something on the CPU. If they are that capable then I'm sure they could just rewrite the code themselves without my 'help'. I'm not sure how much security compilation would offer, and if the details of that matter, that's something I should look into further. But the above seems pretty solid AFAICT.
>samba in a chroot/jail/etc
Thanks! This is a great idea. IIRC it is possible to break out of a chroot, but (IIRC again) not BSD jails.. so that could be a great option down the line if I am able to use BSD. It adds a fair amount of complexity legally (although it seems sound at first thought) and technically though (can they be hacked?), so perhaps one for later.
By "signed" do you mean a DRM-locked down platform ? There's no problem with signed code, there is a problem with trying to claim ownership of a device that the customer owns :-).
There are many appliances shipping with GPLv3 Samba, Netgear, Drobo, IOmega, Synology, just off the top of my head.
Of course none of these are trying to control what the customer does with the appliance.
If you want to control what customers do with their own hardware, write your own SMB3 server. Good luck with that..
Well thanks for replying I suppose. The reality is that the alternative is that customers get no SMB server, and will have to use other file interfaces. The appliances you mention.. well you didn't actually mention any specifically.. but going by the brands it sounds like you are referring to NAS boxes, in which case they are selling only hardware-- they have no valuable software of their own to protect, 99% just linux+samba, maybe they wrote a trivial control-panel backend. Show me a device that has a competitive advantage from its own software that uses GPLv3 code. 'Signed code only' does not have to mean customers aren't free with the hardware (I'd be quite happy to help wipe the device so they can do whatever they want), it just means potential competitors aren't free to steal my code and waste the investment of xyz man hours that went into that. As a consequence, those that mean no malice will lose freedom with my mix of software on the hardware they own, but they can remain free with their own software on the purchased hardware (as above. Except for GPLv3).
There are several cloud filesystem gateway appliances that contain considerable proprietary code on the appliance, and use Samba GPLv3 software to provide gateway services from SMB clients into the cloud.
I'm not at liberty to name them as I am the NAS boxes as most of them are not forthcoming about their use of Samba to anyone but their customers (to whom they provide replaceable source code of course), whereas the NAS vendors are well known users of GPLv3 Samba.
You seem to be under the impression that avoiding GPLv3 code prevents competitors from buying our box and rendering it down to components, including your precious software, and figuring out any trade secrets you may have.
This is a strange and incorrect impression.
If you genuinely want to use Samba in your proprietary appliance, email me (I'm easy to find). I help companies do this every day as part of my job.
You won't be able to use Samba outside of the terms of GPLv3 of course, but most companies not requiring DRM seem to be perfectly comfortable with that.
If you copy a 100mb file and the server returns a short read somewhere in the middle of the read stream the file size on the client is still reported as 100mb, which means file corruption as the data in the client copy isn't the same as what was on the server.
That's how this ended up getting reported to us in the first place.