Hacker Newsnew | past | comments | ask | show | jobs | submit | boricj's commentslogin

I'm working on ghidra-delinker-extension [1], a relocatable object file exporter for Ghidra. Or in other words, a delinker.

Delinking is the art of stripping program for parts, essentially. The tricky part is recovering and resynthesizing relocation spots through analysis. It is a punishingly hard technique to get right because it requires exacting precision to pull off, as mistakes will corrupt the resulting object files in ways that can be difficult to detect and grueling to debug. Still, I've managed to make it work on multiple architectures and object file formats; a user community built up through word of mouth and it's now actively used in several Windows video game decompilation projects.

Recently I've experimented with Copilot and GPT-5.3 to implement support for multiple major features, like OMF object file format and DWARF debugging symbols generation. The results have been very promising, to the point where I can delegate the brunt of the work to it and stick to architecture design and code review. I've previously learned the hard way that the only way to keep this extension from imploding on itself was with an exhaustive regression test suite and it appears to guardrail the AI very effectively.

Given that I work alone on this in my spare time, I have a finite amount of endurance and context and I was reaching the limits of what I could manage on my own. There's only so much esoterica about ISAs/object file formats/toolchains/platforms that can fit at once in one brain and some features (debugging symbols generation) were simply out of reach. Now, it seems that I can finally avoid burning out on this project, albeit at a fairly high rate of premium requests consumption.

Interestingly enough, I've also experimented with local AI (mostly oss-gpt-20b) and it suffers from complete neural collapse when trying to work on this, probably because it's a genuinely difficult topic even for humans.

[1] https://github.com/boricj/ghidra-delinker-extension


I'm working on an embedded project where I'm actually thinking about using ECS on a STM32H5. It's not about cache-friendly locality (waitstates on internal SRAM for MCUs is basically a rounding error compared to the DRAM latency seen on desktop or server-class hardware), but the business is so complex that the traditional inheritance/composition is getting quite messy. When you end up using virtual and diamond inheritance in C++, you know you're in trouble...

It's too bad that ECS isn't more widely known outside of gamedev. It's not just good for performance, it's also legitimately a useful architecture design on its own to solve problems.


I'm working on ghidra-delinker-extension [1], which is a relocatable object file exporter for Ghidra.

The algorithms needed to slice up a Ghidra database into relocatable sections, and especially to recover relocations through analysis are really tricky to get right. My MIPS analyzer in particular is an eldritch horror due to several factors combining into a huge mess (branch delay slots, split HI16/LO16 relocations, code flow analysis, register graph dependency...).

The entire endeavor requires an unusual level of exacting precision to work and will produce some really exotic undefined behavior when it fails, but when it works you feel like a mechanic in a Mad Max universe, stripping programs for parts and building unholy chimeras from them, some examples I've linked in the readme. It has also led to a poster presentation to the SURE workshop at ACM CCS 2025 in Taiwan as a hobbyist, an absolutely insane story.

[1] https://github.com/boricj/ghidra-delinker-extension


Mad respect. I tried extracting a clean .o file out of a statically linked ELF once, and it's an absolute nightmare. How are you handling switch tables and indirect jumps? Without dynamic analysis, it's sometimes physically impossible to figure out what a register is actually pointing to

I have analyzers that resynthesize relocations from the contents of the Ghidra database, no custom annotations required. They evaluate relocation candidate spots through primary references and pointers/instructions and emit warnings if the math doesn't check out.

It does require a reasonably accurate Ghidra database to work properly, but I've had users delink megabytes of code and data from a program successfully (as in, relinking it at a different address results in a functionally identical executable) once they've cleaned it up. The accuracy warning in the readme is mostly because it's really complicated to describe exactly what inaccuracies you can get away with, there's a fair amount of wiggle room in reality as long as you know what you're doing.


I'm working on stuff in that market, it's still largely is. DC Power System Design For Telecommunications is still a must read and it doesn't even cover the last 15 years or so of development, notably lithium batteries and high efficiency rectifiers.

I will say that this is a surprisingly deep and complex domain. The amount of flexibility, variety and scalability you see in DC architectures is mind-boogling. They can span from a 3kW system that fits in 2U all the way to multiples of 100kWs that span entire buildings and be powered through any combination of grid, solar and/or gas.


This appears to be organized top-down, with categories as the entry point. How do we reverse-lookup a story? That is, given a story, how can I find it (if it's there) and walk it back up into its category?


Interesting, I wanted to add that soon actually. What’s your use case?


I have stories (assuming that they've made the cut) that are either in my favorites or that I've submitted, that I'd like to know how they were classified and alongside what other stories. So instead of browsing this from the outside in, I'd browse it from the inside out.


That article made me chuckle.

I'm currently building a full-blown OpenAPI toolchain at work, where the OpenAPI document itself is the AST. It contains passes for reference inlining, document merging, JSON Schema validation, C++ code generation and has further plans for data model bindings, HTML5 UI...

Why? Because I'm working on a new embedded system which has a data model so complex, it blew past 10k lines of OpenAPI specifications with no end in sight. I said "ain't no way we're implementing this by hand" and embarked on the mother of all yak shavings.

I want all of the boilerplate/glue code derived from a single source of truth: base C++ data classes, data model bindings, configuration management, change notifications, REST API, state replication for device twins and more. That way we can focus on the domain logic instead, which is already plenty complex on its own.

I'm not designing all of this to be simple to develop. I'm designing it so that it's simple for the developers. Even with the incomplete prototype I have currently, the team is already sold ("you mean I just write the REST API specification and it generates all of the C++ classes for me to inherit?"). The roadmap of features for that toolchain is defined, clear and purposeful: to delete mountains of menial, bug-prone source code before it is ever written by hand.

Sometimes, it takes complexity to deliver simplicity. The trick is to nail the abstractions in-between.


I wrote something similar a while ago: https://github.com/boricj/hang-os

It handles interrupts/traps and targets the aarch64 QEMU virt platform. It also features a HAL.


Might as well plug in my own extension: https://github.com/boricj/ghidra-delinker-extension

It's a relocatable object file exporter that supports x86/MIPS and ELF/COFF. In other words, it can delink any program selection and you can reuse the bits for various use-cases, including making new programs Mad Max-style.

It carved itself a niche in the Windows decompilation community, used alongside objdiff or decomp.me.


Easily one of the coolest RE projects out there, I've always looked on in awe.

> The relocation table synthesizer analyzer relies on a fully populated Ghidra database (with correctly declared symbols, data types and references) in order to work

It's a shame that this requirement exists (I am well aware that it's a functional necessity), because all the stuff I want to relink is far too big to make a full db!


You only need a full DB if you want to fully delink your artifact. You can just clean up the subset you're interested in exporting (the fully populated disclaimer is just there because there's a lot you can get away with, as long as you know precisely what you are doing).

Even then, a full DB is quite achievable, even on large projects. The biggest public project using ghidra-delinker-extension out there is the FUEL decompilation: https://github.com/widberg/FUELDecompilation

The executable is 7 MiB, has over 30,000 functions and has more than 250,000 relocations spots. The user made the game relocatable in six weeks (with four of them debugging issues with my extension). They then managed to replace code in spite of the fact that the artifact was built with LTO by binary patching __usercall into MSVC.

There's a write-up about all of that that is well worth a read: https://github.com/widberg/fmtk/wiki/Decompilation

I've also had one user manage to fully delink the original Halo on the Xbox in one week. To be fair, they were completely nerd-sniped and worked non-stop on it, but it still counts.


Whoah, that is super impressive. My target binary is 9MiB, seemingly only 10k relocations IIRC but 37k functions.

I might try a partial delink and see how it goes!


Where can I learn more about the Windows decompilation community? (This is an area I kind of work in, and I am interested in participating!)


Most of my known userbase hangs out in the decomp.me Discord server. Each project also tends to have its own dedicated Discord server.

The Windows decompilation community is far more fragmented than the console one, as it hasn't coalesced around a common set of tools like splat or decomp-toolkit.


What is Mad Max-style?


I imagine PIE chunks that you can kludge into other programs to Frankenstein implementations? Kind of like how mad max cars are made of bits and pieces bolted together


Indeed, you can kludge anything together into working chimeras, as long as you can mend the ABIs together.

I've done a case study where I've ported a Linux a.out program into a native Windows PE program without source code: https://boricj.net/atari-jaguar-sdk/2023/11/27/introduction....

Another case study was ripping the archive code from a PlayStation game and stuffing it into a Linux MIPS program to create an asset extractor: https://boricj.net/tenchu1/2024/03/18/part-6.html


You sir are a true wizard!


Half-assing self-hosting sucks, regardless of the underlying platform. You tie things together with shoestrings and gum, leaving ticking timebombs and riddles to your future self.

This is the point where I'm supposed to describe my self-hosting solution on my so-called homelab, where my blog lives. I won't, because it's both stupid in smart ways and smart in stupid ways, therefore it sucks all the way.

Self-hosting is like any hobby. Half-ass it and you'll half-like it.


I've just started a new personal project, a C++20 library for running composable visitors over data documents and data models with JSON/CBOR semantics, DOM-less.

Basically, if you define a data model with bindings, you can inject data into it or extract data from it by running SAX-style visitors. You can use serializers/deserializers for standard formats like JSON/BSON/CBOR/CSV, or you can define custom formats for formating structured data however you want to. You can also run a serializer visitor on a deserializer to convert between formats. You can compose filter visitors to extract a subtree or filter out keys. And it's designed to fit on microcontrollers with very limited dynamic memory allocations, because it either streams data on-the-fly or works directly with the underlying data format in a big preallocated buffer.

I worked with libraries that offered a subset of these features before in my professional career (even built one myself), but recently I've had an epiphany (a document can also be used as a data model) that makes me think I can create something elegant and unique.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: