And that's such a hard problem that you don't even have to go down to CPU scale for it to matter.
If you look at mainboards (or some smaller PCBs), the connecting lanes between e.g. RAM and CPU closer to the centre are often zigzagged, to match the lengths – and delays – of the necessarily longer outer lanes. The speed of light is actually really damn slow.
Well, not entirely. For data signals in general you just need all the lines to resolve to the desired value before the clock. You do need to be very careful about the clock, though.
Surprisingly, the behavior at that speeds is fairly well known. I don't even think you need to move from FR4 to a more controlled material (e.g.generally in mmWave, Isola FR408, along with sometimes Rogers stuff, IME, is what is used). Allegro PCB SI(Signal Integrity) even models high speed timing fairly well at the design stage.
You've got plenty of stuff at the test stage too. E.g., the gear on high-end Lecroy's metrology test gear is at 100ghz. Agilent (Keysight, whatever, it's still HP to me) has a full test rig for USB3.1[0] at 10gbps for their consumer level gear (again, fairly slow). Step it up to FPGA speeds and here's[1] an app-note by Altera with way higher speeds.
Here's a really brief overview of 'rules of thumb' that work up to PCI-e[2] by TI. 50 minutes and worth a watch. Clean power that won't couple in, matching lines lengths on diff pairs, proper isolated ground planes (give AGND and DGND their own layers) and proper termination will easily get you 95% of the way there.
(Shameless promotion-- available for high-speed design, pre-EMC compliance testing, fault diagnostics, etc).