More

Rutledge · 2026-04-01T22:05:23 1775081123

Scorecard AI | Founding Engineer | San Francisco (Onsite) | Full-Time | https://scorecard.io

Scorecard builds simulation environments and reward models that frontier AI labs and enterprises use to train and evaluate their agents. Same discipline that made self-driving cars work, applied to LLM-based agents.

Our founding team built simulation infrastructure at Waymo, SpaceX, and Uber ATG. I'm Dare, the CEO and scaled Waymo's simulation org to 200+ engineers. We're ~7 people, 7-figure revenue, $3.75M seed led by Kindred Ventures with angels from OpenAI, Apple, Waymo, Uber, Perplexity, and Meta.

Founding Engineer ($175K-$280K + strong equity). Primarily backend. You'll own technical domains end-to-end. We use agentic tooling (Claude Code) heavily. Moving fast and working customers matters more than deep expertise in any one language.

Stack: TypeScript, Node.js, Next.js, PostgreSQL, ClickHouse, Temporal, GCP.

jobs@scorecard.io

Rutledge · 2026-01-02T23:14:10 1767395650

Scorecard is the simulation platform for self-improving AI agents. We help teams encode expert judgment into reward models and run 10,000s of scenarios in minutes instead of reviewing 10s of production cases over weeks.

Our team built simulation systems at Waymo, Uber ATG, and SpaceX. Same discipline, applied to AI agents. Backed by Kindred Ventures and Neo, with multi-billion dollar customers.

Founding Software Engineer ($175k-$250k + equity)

Build infrastructure for large-scale agent simulation, reward model pipelines, and scenario generation. Ship fast with a low-ego team.

Stack: TypeScript, React, Node.js, Postgres. Bonus: LLM/RL experience, founder background.

https://www.scorecard.io/careers/software-engineer

Founding GTM Lead

Own demos, sales playbook, and positioning for AI teams building frontier agents. 3+ years early-stage sales/product marketing. AI/developer tools experience a plus.

https://www.scorecard.io/careers/founding-gtm

In-person in SoMa. Full benefits, daily lunch, unlimited PTO.

Apply: jobs [at] scorecard.io (mention HN)

Rutledge · 2025-11-03T18:09:15 1762193355

Scorecard | Founding Engineer, Founding UX Designer, Founding GTM | SF, CA ONSITE | Full-time

Scorecard is building the leading platform for testing, evaluating, and monitoring AI applications. We help teams ship reliable AI products faster—from prototype to production. Our customers include developers and enterprises building with LLMs who need confidence their AI agents perform as expected.

We recently raised $3.75M in seed funding from Kindred Ventures, Neo, and angels from OpenAI, Google, and Meta: https://www.businessinsider.com/scorecard-raises-millions-ki...

See how we're helping enterprises like Thomson Reuters ensure their AI agents are production-ready: https://www.thomsonreuters.com/en-us/posts/innovation/from-t...

Tech Stack: Full TypeScript w/Next.js, Express, React, PostgreSQL, and agents like Claude Code/Gemini review.

We're an early-stage, fast-growing team tackling the most pressing problems in the AI reliability space. If you're excited about being a founding team member at a company defining how the industry evaluates and optimizes AI systems, we'd love to hear from you.

Open Roles:

- Founding Software Engineer: Build the core platform that helps developers test and evaluate AI agents at scale - Founding UX Designer: Design intuitive experiences that make complex AI evaluation accessible to all developers - Founding GTM: Help define and execute our go-to-market strategy as we scale with customers

Learn more and apply: jobs@scorecard.io w/ subject 'HN'

Rutledge · 2025-09-30T07:13:08 1759216388

I call them 'CLI agents'!

Rutledge · 2025-06-25T06:31:44 1750833104

Here's the image from Wayback: https://web.archive.org/web/20250625051706/https://blog.goog...

The biggest diffs from Claude code (the current champion): 1. Generous free tier (60 RPM!) 2. Open Source Apache (Standard after OAI Codex did the same)

Rutledge · 2025-05-30T20:07:46 1748635666

Hi HN- we're excited to launch the first remote MCP server for claude.ai and cursor for LLM evaluation. Would love your thoughts and feedback :)

Rutledge · 2025-05-30T20:09:26 1748635766

Aannnnndd X is down x) Here's the LI: https://www.linkedin.com/posts/scorecard-ai_introducing-scor...

Rutledge · on March 8, 2025

Here's the repo: https://github.com/agntcy and docs: https://docs.agntcy.org/pages/abstract.html

Rutledge · on Feb 27, 2025

This initiative is designed to be community-driven, so we're looking forward to your feedback on what agent benchmarking needs exist in your domains. While starting with legal AI, we plan to expand across industries where benchmarks for AI agents evaluation are needed.

Rutledge · on Feb 17, 2025

The concurrent request handling seems great for our AI eval workloads, where we're waiting for LLM API calls and DB operations but curious how Vercel handles potential noisy neighbor issues when one request consumes excessive CPU/memory?

Disclosure: CEO of Scorecard- AI eval platform, current Vercel customer. Intrigued since most of our time serverless time is spent waiting for model responses, but cautious about 'magic' solutions.

schniz · on Feb 17, 2025

We built Fluid with noisy neighbors(=requests to the same instance) in mind. So because we are a data-driven team, we

1. track metrics and have our own dashboards to ensure we proactively understand and act whenever something like that happens 2. also use these metrics in our routing to smartly know when to scale up. we have tested a lot of variations of all the metrics we gather and things are looking good

anyway, the more workload types we will host with this system, the more we know and the better/performant it will get. we're running this for a while now, and it shows great results.

there's no magic, just data coming from a complex system, fed into a fairly complex system!

hope that answers the question, and thanks for trusting us

alex12de · on Feb 19, 2025

So if undertood 1. correctly I could use this solution to potencially save money, but it could turn into a nigthmare very quickly if you guys aren't watching?

Rutledge · on Feb 19, 2025

Yes quite helpful- thanks for explaining and will try it out!

tuananh · on Feb 17, 2025

i think the majority of Vercel customers are doing web site hosting & most of the web requests are IO bound so it makes sense to handle multiple requests per microvm.

can't say the same if customer is doing CPU bound workload.

Rutledge · on Dec 17, 2024

This is great :) and pretty impressive that it was possible in coda!