Hacker Newsnew | past | comments | ask | show | jobs | submit | DangitBobby's commentslogin

It's not, I don't know why you'd think that.

Maybe we could get enough support behind an amendment to amend the amendment process.

Yep it's more distinctive, more intrusive, spreads further, smells worse.

But is it more unhealthy? The rest are simply adult "preferences".

Are you arguing that my toddler should be okay with it? The point is that it’s not about what I am okay with it’s about my being responsible for my son and what his adult self might want. We had opinions about the positive health effects of cigarettes in the 1940’s and 1950’s that turned out to be wrong. There’s a possibility you’re wrong about pot smoking too.

I get that fucking smell everywhere now even while it's still illegal.

I think about this blog post nearly every day. I get what seems like a dozen emails a month from GCP (only 9 in the last 30 days!).

There are influential people who make lots of money when the US Govt forces the country to rely on fossil fuels.

But you _can_ run it on 90% solar plus 10% fossil fuels to achieve 100% power availability, which is what GP and the article suggest.

The issue is that to achieve that you can't just build 90% solar plus 10% fossil fuels. You would need to build 100% solar + 100% fossil fuels for the 10% of the time solar doesn't work.

If you build batteries on the scale that the article suggests (and is probably going to happen in the real future) you can use batteries charged from fossil fuels.

It's a few percent dirtier (round trip losses) but in return you can use gas plants that are 50% more efficient to charge them rather than run peaker plants.

And of course that's ignoring wind which is nearly as cheap as solar and anti-correlated with it.


That's fair, batteries are somewhat useful for peaking even in a world powered 100% by fossil fuels so there's some infrastructure that can be shared. And even on a cloudy day solar output isn't 0%. But I'm skeptical the overlap here is significant enough to invalidate my basic point, though I admit it's a big simplification.

Reality is extremely complicated, so realistically the exact mix of solar + fossil fuels that makes sense is going to depend on a huge number of factors and vary from region to region depending on weather, fuel costs, construction costs, transmission costs, and probably a thousand other things I haven't thought of. The best thing to do is stay out of the way of both industries and let the market sort all of that complexity out.

I would speculate the result of that is going to be a lot more renewables than currently exist, mainly due to the drastic reduction in the cost of solar and batteries that has been occurring over the last few decades, but I don't think it'll be 100% or even 90% renewables either (expect perhaps in the extremely long term). Time will tell.


It helps that the cost of a simple cycle gas turbine power plant (before the recent data center demand spike) is around $600/kW, maybe a factor of 20 cheaper per kW than a nuclear power plant. So backing up the whole grid with such generators wouldn't be that expensive.

Good thing it's already built then! Well, of course it cost money to maintain though.

Yes, but if you need to have all that infrastructure anyway it no longer makes sense to compare the cost of solar+batteries with the cost of fossil fuels because you actually need to have both.

If you compare the total cost of solar with just the fuel cost of fossil fuels (ignoring its CapEx and non-fuel OpEx) that swings the equation a lot.


Infrastructure cost for 100% is the same as infrastructure cost for 10%? That's not true. The distribution network is the part that can't be scaled, but it can also be reused for either source, so it doesn't double in cost.

No, I'm saying infrastructure cost for 100% is the same as infrastructure cost for 100%. You can't build 10% as much fossil fuel infrastructure and expect it to carry 100% of the load when solar isn't working. And obviously I'm talking about generation here, not distribution.

That's not carbon neutral. You can use synthetic fuels to make it fully carbon neutral (way easier to store than the often-proposed H2) but that's really just another battery.

I guess I have pandas brain because I definitely want to drop duplicates, 100% of the time I'm worried about duplicates and 99% of the time the only thing I want to do with duplicates is drop them. When you've got 19 columns it's _really fucking annoying_ if the tool you're using doesn't have an obvious way to say `select distinct on () from my_shit`. Close second at say, 98% of the time, I want to a get a count of duplicates as a sanity check because I know to expect a certain amount of them. Pandas makes that easy too in a way SQL makes really fucking annoying. There are a lot of parts on pandas that made me stop using it long ago but first class duplicates handling is not among them.

And the API is vastly superior to SQL is some respects from a user perspective despite being all over the place in others. Dataframe select/filtering e.g. df = df[df.duplicated(keep='last')] is simple, expressive, obvious, and doesn't result in bleeding fingers. The main problem is the rest of the language around it with all the indentations, newlines, loops, functions and so on can be too terse or too dense and much hard to read than SQL.


Duplicates in source data are almost always a sign of bad data modeling, or of analysts and engineers disregarding a good data model. But I agree that this ubiquitous antipattern that nobody should be doing can still be usefully made concise. There should be a select distinct * operation.

And FWIW I personally hate writing raw SQL. But the problem with the API is not the data operations available, it's the syntax and lack of composability. It's English rather than ALGOL/C-style. Variables and functions, to the extent they exist at all, are second-class, making abstraction high-friction.


Oooh buddy how's the view from that ivory tower??

But seriously I'm not in always in control of upstream data, I get stuff thrown over to my side of the fence by an organization who just needs data jiggled around for one-off ops purposes. They are communicating to me via CSV file scraped from Excel files in their Shared Drive, kind of thing.


Do what you gotta do, but most of my job for the past decade has been replacing data pipelines that randomly duplicate data with pipelines that solve duplication at the source, and my users strongly prefer it.

Of course, a lot of one-off data analysis has no rules but get a quick answer that no one will complain about!


I updated my OG comment for context. As an org we also help clients come up with pipelines but it's just unrealistic to do a top-down rebuild of their operations to make one-off data exports appeal to my sensibilities.

I agree, sometimes data comes to you in a state that is beyond the point where rigor is helpful. And for some people that kind of data is most of their job!

Duplicates are a sign of reality. Only where you have the resources to have dedicated people clean and organize data do you have well modeled data. Pandas is a power tool for making sense of real data.

> Duplicates in source data are almost always a sign of bad data modeling

Nope. Duplicates in source data(INPUT) is natural, correct and MUST be supported or almost all data become impossible.

What is the actual problem is the OUTPUT. Duplicates on the OUTPUT need to be controlled and explicit. In general, we need in the OUTPUT a unique rowby a N-key, but probably not need it to be unique for the rest, so, in the relational model, you need unique for a combination of columns (rarely, by ALL of them).


You articulate your case well, thank you!

I always warn people (particularly junior people) though that blindly dropping duplicates is a dangerous habit because it helps you and others in your organization ignore the causes of bad data quickly without getting them fixed at the source. Over time, that breeds a lot of complexity and inefficiency. And it can easily mask flaws in one's own logic or understanding of the data and its properties.


When I'm in pandas (or was, I don't use it anymore) I'm always downstream of some weird data process that ultimately exported to a CSV from a team that I know has very lax standards for data wrangling, or it is just not their core competency. I agree that duplicates are a smell but they happen often in the use-cases that I'm specifically reaching to pandas for.

Exactly. It’s not that getting rid of duplicates is bad, is that they may be a symptom of something worse. E.g. incorrect aggregation logic

When you "steal" a secret, it's not longer a secret. When you "steal" credit, the original thinker no longer gets credit. In both cases, the thing itself was destroyed: in the former, the secret is no longer a secret at all and in the latter the boss will no longer be considered the mastermind behind the idea. When you "pirate" something the original copy remains and the creator retains it and the rights to sell copies of it and will still benefit from selling copies. It's not theft.


I've recently implemented hooks that make it impossible for Claude to use tools that I don't want it to use. You could consider setting up a tool that errors if if they do an unsafe use of sed (or any use of sed if there are safer tools).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: