Hair Today Gone Tomorrow: Four Men Go To Fix A Wafer Prober
I’ve had a fairly varied early part of my career in the semiconductors business: a series of events caused me to jump disciplines a little bit, and after one such event, I landed in the test engineering department at Philips Semiconductors. I was tasked with a variety of oddball projects, supporting engineering work, fixing broken ATE equipment, and given a absolute ton of training: Good times! Here’s a story that comes straight off the oddball pile.
We needed to assemble a crack team of experts and high-tail it to deepest darkest Wales, and sort out an urgent production problem. The brief was that the wafer probe yield was disastrous and the correlation wafer was not giving the correct results. Getting to the punch line is going to require some IC fabrication background, but if you like stories about silicon, or red-bearded test engineers, it’s worth it.
The naïve IC design is one that forgoes testability in pursuit of die area, development time and some misplaced notion of simplicity. I cannot say how many times I’ve come across stories from the past where a new startup proudly makes silicon, but has almost no testability features and they get really hammered when trying to ramp up production.
The chip may be ‘right by design’ in that it is targeted correctly to the fabrication process, and on the bench, the product validation process may show it is within specification and appropriate for the customer’s end application. However once that’s done, we need a verification process to ensure the chips going out the door adhere to that specification, under normal process variances, and they are defect free. See this interesting guide from NASA about this important distinction.
The test engineer’s first port-of-call is the concept of Zero Defect Testing, which relies on design support known as Design For Test (DFT.) This basic guide will serve as an introduction to these concepts for those interested.
These testing methodologies require functional changes to the digital parts of the design, by a DFT engineer. They are almost always automated by dedicated tools as part of the digital synthesis flow, with the test patterns needed to activate ‘scan test mode’ generated by a special ATPG (Automated Test Pattern Generator) tool. This analyses the gate-level netlist of the IC and figures out how to perform the required structural tests. An example of this, that i’ve used in the past is Synopsys Tetramax. There are many, many variations on this test scheme, such as Logic-BIST (Built-In Self-Test) but that’s a subject for another day. Anyway, the goal here is to get the test patterns needed to feed into the test development process, further down the line.
Once the IC design is completed, the thing needs to be manufactured. This is done in hugely complicated, immensely expensive facilities, hereafter referred to as the ‘fab’, in a handful of locations around the world. Companies like TSMC, SSMC, or Tower will manufacture your design on a contract basis, shipping wafers off to wherever they need to go to get tested and packaged up.
Wafers are shipped out in a plastic ‘boat’, typically 25 at a time, to test-and-assembly facilities all over the world, where they optionally may be tested, then sawn up and packaged. Each wafer will have hundreds or thousands of identical copies of the device, stepped on a rectangular grid over the surface.
Depending on the assembly process, what happens next will work in different ways. Older or simpler designs may just have a simple pad ring surrounding the core of the chip, with metallised contact areas for each connection (called ‘pads’). When packaged, the freshly sawn (or ‘singulated’) die will be glued to a lead frame (a metallic scaffold) before gold (or copper, lately) bond wires are attached between this leadframe and the chip pads, using a wire-bonder machine. The whole thing is then stuffed in a mould and plastic is injection molded around it, before the packages are singulated ready for ‘final test’. Many things could have gone wrong up until this point, so it is critical to test the packages before they are packaged for distribution.
From a test perspective, the wafers will undergo testing as they are processed by the fab. There will be a number of special test dies dotted around the wafer, mixed in with the application dies. These are referred to as Process Control Monitors (PCMs) and are used by the fab internally to measure wafer quality and ensure that the fabrication process is within specified parameters (hence ‘process control’). Select PCM data are delivered to the end customer by an automated process to be accessible by the product engineer (engineer in charge of product manufacturing) if they need it. Once the finished wafers are shipped out to the test house, they might undergo wafer probe testing. If the package cost is low, then the yield impact of not testing before packaging could be low enough that it is not being economically viable to test before packaging. We call these ‘blind assembled’ packages, and so long as the ‘final test’ is good quality, we can be sure the customer will get fully working, in-spec devices. However, if the package is complex, and expensive, wafer probing becomes necessary, and of course, if the product is a Chip Scale Package (e.g. flip-chip) there won’t be a package (or a lead-frame for that matter) as such, just a bunch of solder balls stuck onto the RDL (redistribution layer) but that’s a different problem.
The general plan is to use testing at a variety of points to improve overall quality, by weeding out bad wafers (PCM data) bad die (wafer probe) and then packaging up the good die, before a final quality check. At either of the latter stages product may also be speed/performance graded (or binned) for different final specs. It all depends on the product. What tests are done at which stage also depends on a multitude of factors such as test feasibility with the available ATE platforms, and the expected failure rates vs cost of test. Test-cost economics and subsequent planning is a tricky subject.
So let’s talk about the tools of the trade. Automated Test Equipment (ATE) is a bit of broad term, but we’re dealing here with dedicated IC test platforms, so that narrows the scope a little. The target platform was the older Teradyne Catalyst system. This machine is an absolute beast, physically huge and heavy, and sucks 3-phase power like its going out of fashion.
The so-called mainframe box, contains big power supplies, support circuits, and waveform sources (if RF is installed) is the size of double-wide US-style refrigerator, but weighs in at a couple of tons. This is attached via an umbilical to the test head. This last bit is the business end, containing the test electronics, local power supplies and is where the tester interfaces to the application.
The idea with these platforms, is that rather than designing a custom test rig per product with a full suite of equipment tied to the requirements of the device, you can design the chip to be testable to within the capabilities of a general purpose platform, then you just need to configure one with enough resources to interface with your device, in its test modes.
A typical ATE test head will be a large box about the size of washing machine (but likely weighing much more) containing some really expensive electronics. They’re usually arranged with a few columns of parallel slots, each slot will have a pogo pin block with sometimes, hundreds of pogos per block. Some slots may be empty, some may have special connections for RF and high current. Underneath the pogo blocks are connected with cable bundles down to the internal card cage.
These cards use local timing generators (linked to low-noise clocks) to format patterns into waveforms that are driven into the IC under test. It’s not unusual at all for a test head to contain upwards of 2000 digital channels operating in parallel at 800 Mbps or more. These cards also contain Pin Parametric Measurement Units (PPMUs) capable of sourcing or sinking current into a pin and taking measurements. This allows defects such as disconnected or shorted pins, and pin leakage current to be tested. Remember that last bit for later! A typical ATE test head is capable of parallel testing multiple ICs simultaneously, if resources allow, as I said earlier, these are generic platforms and it depends on the needs of the product.
This is where the generic ATE bit stops and the custom per-design bit starts. In order to test a chip on an ATE, you need an interface. If the product requires wafer level testing, then more hardware is needed. In addition to a manual test rig, I will also have been designing a probe card, which will connect to the ATE head with a generic adaptor called a pogo tower.
The interface to the wafer is via a custom-designed circular probe card PCB, which has the matching ring of pogo lands (just surface pads) and a big hole in the middle. The space in the middle is to take the probe ‘spider’ which is a circular array of fine, stiff wires, each with a bend in at the end, terminating in a sharp point. The final shape of this array of points, the probe ‘needles’, is designed to align with the pads on the die, and with some suitable epoxy stiffener spread over the wire array, the probe tips will be perfectly aligned and co-planar. This is a very specialist job to construct and they are delicate assemblies.
Wafer probers are cool. The idea is you bolt the wafer probe card into the top of the machine, needles-down, then wheel the whole thing over to the ATE. The ATE head will have that pogo-tower already docked. Next you manipulate the test head up, over your head so you can turn it upside down and dock it down onto the top of the wafer prober machine, locking onto the probe card with a twist of a mechanism.
The prober machine will push the wafer up very carefully, into the fixed probe needle array, making all the electrical contacts. It then instructs the tester to test one or more dies and collects the binning results, before stepping onto the next die. This is repeated for the whole wafer, and each wafer in the cassette. The X-Y coordinate of the die and test results are saved to a datalog stream, so wafer maps and yield can be obtained. The wafer map is a lookup table of test result vs die location, and this is what is fed into the assembly line to pick out the good die to package. This used to be done by the machine ink-spotting bad die!
So that’s the why and how of automated IC test and the development process; back to our story! Our crack debug team consisted of an expert in wafer probers, a mechanical designer, the visiting test floor manager from Philips Semiconductors Calamba (Philippines), and me, then, a test engineer-in-training.
The product was a mature long-running product, built on 200 mm wafers, an older process node (most likely TSMC 350 nm) and was very much just ticking along. Until one day it wasn’t. Wafer probing was courtesy of an old Electroglas EG4080 (pdf of latest 4090 linked) docked up to a Teradyne Catalyst. Quite a common configuration at that time. The yield had dropped to zero, so the first thing they did was to run some correlation wafers that were kept onsite, in a special inert-atmosphere cabinet. These were the same wafers used during development, with all test results kept on file, and used for well, correlation purposes when bringing up a new test stand. They could not get a single die to pass, so we were called in to sort it out.
We split up and checked everything. Prober guy checked the EG4080 setup, calibration, looked at it mechanically and inspected the ‘probe marks‘ using the built-in high zoom inspection camera. These witness marks had the characteristic ‘short furrow’ shape, indicating that the needles contacted, then slid slightly across the pad, digging a tiny trench into it. This is because, as the needles made contact, the wafer is driven a little further into the needle, causing it to slide, or scrub laterally. This promotes oxide scrubbing, giving a good electrical contact, and keeps the tips sharp.
There was no evidence of over-driving, or needles deflecting the ‘wrong’ way, and no significant tip wear. Mechanical guy was heavily involved with this process as well as keeping track of tests and checks, and collating all data. I checked the ATE calibration, and inspected all parts of the system, checked that the test program was the correct revision, and initiated a few tests under manual control. I could see which test was failing: pin leakage. Everything else was fine.
This particular test looks for defects in the pad ring, surrounding the die core, or possibly within the package. This works by programming any bidirectional pins to input mode, then sinking a small test current into all input-capable pins, using that PPMU. The results were a bit odd, two pins were giving quite random results, but that was the only obvious fault. We tore it all down again, inspecting every component visually, cleaning contacts, removing dust and contamination, and just looking for any strange current path that could explain what we saw.
Since we couldn’t find anything wrong with the equipment, we started to look at how it was being used. The test floor was typical – climate controlled with a metallised ESD safe floor. All equipment was correctly grounded, all staff fully adorned with overalls, hair coverings and heel grounders. We were just standing around shooting random ideas, talking about contamination and possible environmental factors, when I just blurted out “what if there’s a hair stuck in the probe needles?”
What if there’s a hair stuck in the probe needles?
Pretty much everybody stood there and said that that wasn’t possible, all operators tied hair up under a tight covering, and beards were even covered. “Surely we’d have seen a hair on the cameras?” Nothing was seen. “But, I asked, has anybody actually taken the probe card out and looked at it under a microscope?” “No”, was the answer, as it was delicate, they didn’t want to handle it unnecessarily.
The prober operator just unbolted it and held it up to the light and peered at it closely with a loupe. After a short discussion amongst the floor staff, some tweezers appeared, and the chap very, very carefully pulled out a short, curly ginger hair from between the needles. It went quite quiet on the floor as he proceeded to reassemble the test stack, docked the ATE head, and initiated a manual test on the prober. “Bin 1!” was the exclamation! Switching over to automatic mode, the wafer was probed, and the resulting wafer map on the screen looked exactly like the reference data. Success! The operator in question had a big, ginger beard. We laughed, packed our gear and headed for home. Millions of pounds of electronics equipment, nearly scuppered by a rogue beard hair. You couldn’t make it up.