Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Who has deployed commercial features using GPT4?
80 points by _false on April 15, 2023 | hide | past | favorite | 47 comments
I'd be curious to learn how effective has GPT4 been to enable product features and what it means for the things we might see in the future.

In particular, I have the following questions:

1. What was the product you were working on?

2. Were there any new software engineering challenges that came from working with GPT4 (e.g. output quality, testing, monitoring, etc.)?



Great question!

1. I've observed multiple products across customers.

1.1 Correcting or filling missing information in structured data. For example a system to suggest corrections to products in a company catalogue (each product category has different schema). Unstructured data is pulled from various websites and optionally from categories retrieved from images. It is then compared against the data and most probable fixes are reported.

Most of the work is done by a few polite prompts to GPT3.5/4 (~5 English sentences in total)

1.2 Better search company data. E.g. a chat bot for internal documentation that can also access internal services in order to answer a question. Same ~5 English sentences to do bulk of the work.

1.3 (non commercial) Endangered language preservation. Building a smart agent that is accessible via chat/hardware (like Alexa/Homepod), that talks in native language can understand and helps to preserve the culture. This is a complex one.

2. Tech stack itself is rather simple. Mostly - GPT, LangChain/LlamaIndex, Vector database with embeddings for memory, plugins for external services and potentially agents to drive workflows.

Output quality, testing, monitoring, scalability etc also don't differ much from operating normal "old-school" ML models. If anything, it feels simpler.

The tricky part is that the entire notion of LLM-driven micro-services is new. Quality of the resulting product largely depends on knowing prompting tricks and following the latest news in an area.

Plus the biggest challenge that customers want to be solved: "How can I ran it on my hardware?"


Can you sell me on langchain? To me it seemed useful for building gimmicky examples, but didn’t seem very easy to customize


I came to the same conclusion.


How do Embeddings work for you? I'm yet to play with them, but I imagine: - with large corpus of text, they can be expensive to compute

- they may be limited in some cases - when you want to extract a deeper meaning from text - that is - gpt3/4 would figure out that an answer to a query is in a given document, but embeddings are way more shallow and will miss it

I know they are far better than previous search approaches, but do they have the limitations above?


We are building LandHive AI, which enables you to generate websites by simply promt the system with a website title and a content briefing. See a demo: https://youtu.be/0S5rU0odTOk

My biggest challenge using the api so far was that the output is not reliable. Randomly from time to time it outputs notes and comments even tough I asked to only reply in a code block. Also if I rerun exactly the same promt, it can output something completely different (different content is fine, but I teach chatgpt to follow a structure - it works in 90% of the cases just fine). I’m using 3.5 not 4.

The api can be down regularly. This is annoying especially if you have longer conversations. I had a hard time to resume a conversation. I usually restart the whole process.

However, the overall capabilities are mind blowing. The system surprises me very often.

https://landhive-ai.netnode.ch/home


I’ve found v4 to be much better at following formatting. If I put “reply with code only” in the system message then I don’t get all the explanations.

The system message is completely useless in 3.5 though and you’re better off putting your instructions in the first user message.

I haven’t used v4 in production yet though because it is so slow. I really hope they speed it up.


We were already using LLM’s at https://nureply.com for creating personalization for cold emails but GPT-4 enables us to create more specific and engaging icebreakers for our users.

Challenge comes in pricing and getting a good result. Generally longer the prompt, better the results but you have to adjust accordingly.

Also, generally using only GPT-4 doesn’t make sense. Mix and match between 2 different models make sense. (e.g. data extraction can be done with GPT-3.5 but writing a good email should be done with GPT-4)


How do you manage employee morale and burnout, when you're building software that enables faking "being human" in order to fool people into clicking more?

Is it a matter of everyone being in "don't think about it" mode and focusing on how the technology "enables" something and how happy your customers are? Something something engagement.

> Upload your leads into Nureply and instantly scrapes their personal public data to generate unique personalized first lines that you can use for your cold email icebreaker in seconds

Gross. I'm sure you're GDPR-compliant, yeah? And by GDPR-compliant, I don't just mean having a CYA paragraph in your privacy policy, but rather gathering informed consent from the "uploaded leads" about the processing of their personal data, and having THEM agree to the privacy policy, mmyea?

So, to reiterate my questions: How do you manage morale in a company that creates a not-only dubious/immoral but, in parts of the world, straight up illegal product?


Consent isn’t required for B2B data processing and cold contact as far as I know, like it is for consumer data. There are six lawful basis including consent. The most relevant one in this case is probably legitimate interest. A company can cold contact someone at an organisation if they believe there is a genuine possibility they would buy the product or service they offer. IANAL however.

PS What’s CYA?


"CYA" means "Cover Your Ass", which is used when you have lawyers/compliance people using vague and over-reaching language to "cover their ass", without anything meaningful behind it.

> Consent isn’t required for B2B data processing and cold contact as far as I know, like it is for consumer data.

From what I can tell, the way this company ingests data uses personal channels and lookups. So although a company would be using their services and they are b2b, the targets of those cold emails are getting their personal data ingested by a company they have no knowledge of, and they are for sure not made aware of it in those cold emails.

If those targets are exclusively employees of a company, their professional email is being targeted, and only their professional profiles are being scraped, I would bet it's defensible. I am certain it's not the case though.


Saying illegal doesn’t make a product illegal. We are getting this statement a lot but still nobody can prove it.

Also, what is immoral? How are you so sure we are doing something immoral?

Employees are feeling great and more hyped than I am about the product. They see the value behind what we are doing. When a small business owner from a small country sends you a thank you message, we all want to work on our product more.

We are getting endless messages from small businesses around the world for showing their gratitude. Most of them have great product or service they want to sell and not good at copywriting. They don’t have enough money to hire someone to chase customers/clients but by using our product, they can finally sell their goods and start a business relationship.

We are not getting any contact information from the web. We are just checking their public social profile and website. Where in the GDPR or any similar law this is illegal?

Customer brings their own data. For example, one of my customers got this data from their embassy. They collected the data from companies who wants to buy specific type of material but couldn’t find someone to sell to them. They left their contact information and 100s of other companies followed this. As a result, my customer have 1000s of contact data, which are given with consent, waiting to be contacted.

This customer mentioned above just started their business and their product is awesome but have nobody to reach out those people. Our product fills this hole.

I understand your hate towards what you don’t understand but world is not about just EU or US. We are not breaking the law any means but if you think we do, I want you to prove it so we can fix it.


"I just want to see how European people see the world." - Here is a German perspective:

Do you think more than 50%, or even more than 20%, of the contact details in that data want to be contacted in this manner?

It's stressful having to discard spam for at least an hour later on a Friday than in times past, when you would rather be with your family - I can't see a way this helps the world. I can assure you multiple side projects I run have never given consent to you or other similar services, yet are continuously spammed in this manner.

If you don't care about the receivers, think of the organic and legitimate senders that are not using such GPT-spam techniques. I now discard a higher proportion of people as false positives - perhaps local students keen to learn - who contact me genuinely, because of services like yours.

To truly reflect, imagine this scenario: Would you be up for going to a careers fair in Germany, entirely populated with young people, feeling sad, never having met or heard back a word of advice, because of your changes to human communication? People who never asked or heard of your US "tool".


> We are getting this statement a lot but still nobody can prove it.

Oh dude, take the hint maybe.

> Where in the GDPR or any similar law this is illegal?

You're handling the personal information of, I'm guessing, ~4-6 magnitudes of people who haven't agreed to your privacy policy and data handling practices. You're a lawsuit waiting to happen.


Seems like you know my business more than me, my lawyers and investors apparently. Would you like to join our board as a legal advisor?


Do you really want to pay a 5k EUR consulting fee for me to tell you how you're going to have to seriously rethink your data handling practices? Because if you do, my email's on my profile; but you already got the gist of it for free...


No. I just want to see how European people see the world.


IMO one of the killer use cases of GPT is reformatting information from any format X to any other format Y, and we're using this superpower in the relatively "boring" space of data extraction: https://kadoa.com can turn any website into an API.


I'm in the market for this. How do you get around scrape blockers? For example Target and Walmart are tricky to get the markup for even with services that specialize in this like scrapfly.

Do you guys have indie friendly pricing? I don't have $500/month to spend but could do $20/month.


For $20 you can get over 10k requests from scraping fish: https://scrapingfish.com/buy


Lol $20 won't buy you a turkey sandwich these days


https://scrapfly.io/pricing

https://scrapestack.com/product

https://www.scrapingbee.com/#pricing

There's quite a few products in this segment around that price range.


2023 In an airport

2024 In a cafe

2025 In a posh supermarket

2026 In any supermarket


I tried this with a textual list of oublic grilling spots in my city and it got it completely wrong, hallucinating with high deceiving confidence.

How do you prevent this?


We use LLMs to semantically understand the website and generate the scrapers code for it, not for the actual data extraction (which would be too expensive anyways). We also have checks in place to verify that the extracted data truly exists on the website.


Did you ask for grilling spots, or did you paste the grilling spots from some database and ask it to summarize/reformat?

Parent commenter was talking about the latter. If you did the former, it will hallucinate like crazy, of course.


Watch out / warn users about possible mistakes. With long numbers I've seen it modify them slightly.


love this! signing up


We have it as an option in DemoTime

For those unaware DT produces a highlight-reel video after every software sales meeting.

Not sure if LLM will go on by default. The algorithmic version of DT is super strong, so just generating the scripts with GPT is MUCH worse.

For us the correct usage is to sprinkle in GPT, e.g. to also add a section to the output video which summarizes the user's goals


I'm also curious to know what business problem did you solve with GPT that you couldn't previously or as effectively?

So far I've seen a ton of cool demos, but not much real life business use cases.


We're integrating GPT3/4 functionalities into our hardware engineering SAAS: https://www.valispace.com/ai/

It mainly helps with 2 things:

- allowing engineers to develop their products much faster (especially doing good requirements engineering for now)

- allowing us to demo to/onboard users with data from their specific usecase (prepopulate their trial account)

Hardware engineering at first does not seem like an obvious choice for LLMs, but I think that it will be those vertical solutions that will still surprise us all the most.

Here are some more details, how hardware design gets concretely aided by LLMs: https://assistedeverything.substack.com/p/todays-ai-sucks-at...


I work for Intercom.com, we are currently adding GPT 4 to the support bot

https://www.intercom.com/ai-bot

Looks pretty cool from what I’ve seen


You're doing a terrific job, seriously! <3

We were doing a workshop about LLMs for 110 people a week ago, and we spent 10 minutes explaining how you do stuff - as an example on how it should be done well.


It says answers instantly but there’s no spinner or anything. Waited a minute and no reply.


You have the worst cookie modal. I gave up. Would have thought better from intercom.com


1. I'm integrating ChatGPT extensively into https://CoCalc.com. This integration makes a lot of sense, because cocalc is a platform in which relatively inexperienced students use Jupyter notebooks, linux terminals and Latex. So far, the most popular feature by far is a "Help me fix this" button that appears above stacktraces in Jupyter notebooks.

2. One software engineering challenges is that ChatGPT often outputs code in markdown blocks. I've had to emphasize in prompts that it should explicitly mark the language. I then got inspired to make it possible to evaluate in place the code that appears in these blocks using a Jupyter kernel, and spent a week making that work (so, e.g., if you type a question into the chatgpt box on the landing page at https://cocalc.com, and code appears in the output, often you can just evaluate it right there). There seem to be endless surprises and challenges though. For example, a few minutes ago I realized that sometimes the giant tracebacks one gets when using Python in Jupyter notebooks are so big (even doing simple things with matplotlib) that they end up resulting in too much truncation: https://github.com/sagemathinc/cocalc/issues/6634

3. I'm mostly using GPT-3.5-turbo rather than GPT4, even though I have a GPT4 api key. Aside from costs, GPT4 takes about 4x as long, which often just feels too long for my use case. The average time for a complete response from GPT-3.5 for my application is about 8 seconds, versus over 30s for GPT4.


We built Hex’s Magic features using GPT-4. You can generate, edit, debug, and explain SQL and Python. We have a few hundred people using it every day, and are opening it more broadly soon.

https://hex.tech/magic


Based on the demo this looks really cool. I've applied to waitlist!

How did your software engineering workflow differ when using GPT4 (as compared standard software engineering workflows)?

How did you test the code that makes heavy use of generation?


Heads up your site is broken on iOS Safari. The background animation loads but no text.


We have integrated it into our AskAi feature that allows customers to answer natural language questions about the outcome of a phone call. So for example, “was there an appointment scheduled in the phone call and what was the time scheduled?” Or “what was the final quoted value in the following conversation?” We can then take the structured outputs and use them for conversation tracking with google ads. This is a game changer when so many in the industry still rely on call length to measure a positive customer / lead interaction.

We are ctm.app


Hmm, are you working B2B or B2C? Seems like you could run the call through OpenAI's Whisper then cross reference times and darws with the clients calendar/crm to track conversions at a lower cost, and validate the value of the work performed


We're using GPT 4 to create personalised content for product demo videos for sales teams. Demo here: https://www.linkedin.com/feed/update/urn:li:activity:7049764...

Even though it's much slower, GPT 4 is way more consistent than 3.5. The OpenAI APIs have had lot of flakiness in the past couple of weeks, we retry requests up to 10 times to work around this


Re (2): GPT4 requires engineering for resiliency. The API (currently) has availability issues and high variance in latency. It’s not (yet) ready for interactive use. For other use cases, a good queuing and retry strategy is necessary.


We have deployed gpt4 in multiple layers of our stack at https://88stacks.com using it for tooling, marketing, and other places.


Dude. You gotta stay pre-product. Pre-revenue? No no no. If you’re pre-product your valuation is only capped by their imaginations.


1) Have used it to rewrite a couple prod functions. Can't/won't go into details. 2) It cannot write advanced logic so use with care.


Also even simple stuff it produces buggy code.


yes i used it to give multiple text variants to a marketing tool, it was very simple since i doesnt have direct user input. just do httpcall get text from database have a solid prompt to give alternatives and off we went.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: