Page 1
    Page 2
  Page 3
   

 

 
By William Van Winkle
 
 
COMING MODELS

The 790FX launched in November as the 7 Series’s opening shot, but more SKUs will follow soon. In February, we’ll see the 780G (RS780). This chip will enable one CPU and one physical x16 PCIe slot backed by HyperTransport 3.0 and PCI Express 2.0. The major call-out for this part is its integrated graphics, which will be accompanied by sideport memory for the local frame buffer. AMD remains mum as of this writing on which exact IGP core will inhabit the 780G. However, given that the 690G went with the Radeon X1250, adoption of something close to the Radeon HD 2350 seems likely-—with one important exception. The HD 2350 is an OEM part that went out with the core’s UVD functionality disabled. As we’ll see below, UVD is a critical element in offering support for efficient playback of HD DVD and Blu-ray media, and we know that the 780G will support UVD, HDCP decryption, and DirectX 10. Note that the more value-oriented 780V edition of this chip will preserve HDCP and DirectX 10 functionality, but other areas, including multi-monitor output (SurroundView) and Avivo HD, will either be curtailed or absent.

Three months later, in May, AMD will release the mobile variant of the 780G. Assuming there’s sufficient availability in whitebook designs, this could offer a noteworthy alternative to the similarly equipped X4500 IGP Intel is planning for its Q2 Centrino refresh, code-named Montevina. A discrete-only version of the 780G, dubbed the 770 (RX780), has already slipped into the public eye by way of Universal abit (above) and others. This board delivers support for support for PCIe 2.0 and HT3.0, plus it pairs the 770 northbridge with the SB600 southbridge.

As a rule of thumb, keep in mind that the 790FX chipset is for boards featuring four graphics slots. (And don’t think that these will be rare birds. There are 10 FX boards from eight vendors slated to launch with Spider.) The 790X chipset will appear on boards with two slots, and the 700 chipset is for boards with a single PCIe graphics connector.

Continuing further into the value range, look for the 740G, 740V, and discrete 740 chips. These northbridges will support one x16 PCIe graphics slot, but the HyperTransport and PCI Express links are both version 1.0. Additionally, the integrated graphics parts will likely use an older core as they only support DirectX 9, not that bargain-oriented value systems are likely to need DX10 in the near term.

King of the Quad Hill
Among the 7 Series chipsets, the 790FX sits uncontested. With support for quad-graphics, PCIe 2.0, and much more, the 790FX is AMD’s premier showcase for quad-core Phenom.

And since we’re discussing budget buyers, be aware that the forthcoming SB700 and SB750 southbridges will not only support hybrid (flash memory-equipped) hard drives but also HyperFlash. HyperFlash is AMD’s alternative to Intel’s Turbo Memory—-essentially a 512MB, 1GB, or 2GB mass of flash memory on a small PCB—-that serves to act as a system-level file cache and accelerate overall performance, especially in PCs with lower amounts of RAM. Whereas ITM plugs into a mini-PCIe connector, HyperFlash will plug into an ATA connector.

Performance improvements with ITM have been spotty at best, which is why, even though Intel’s 3 Series chipsets support ITM, you don’t see Intel desktop boards running around with flash memory on them (so far anyway). Shipping HyperFlash isn’t expected until at least February 2008, and we remain just as skeptical of this enhancement on the AMD side as we are with Intel. For a few dollars more, increasing the system’s RAM will generally do a lot more to boost overall performance.

Much more significant for these two southbridges are the inclusion of DASH 1.0 support. This is an extremely important update for those selling commercial desktop PCs. Essentially, if you were to take most of the remote management aspects of Intel’s vPro platform, you’d have DASH 1.0. (Note that the DASH spec, short for Desktop and mobile Architecture for System Hardware, is an open industry effort of the Distributed Management Task Force [DMTF] and was authored in large part by Intel.) This isn’t the place for us to dive into a discussion on remote management, but suffice it to say that, come February, if your AMD commercial desktops aren’t going out with the ability to support DASH, you should seriously examine your long-term business strategy. Remote management may turn out to be the must-have feature on all commercial PCs in 2008 and could even make inroads with consumers.

The entire 7 Series is compatible with the SB700/750 and will eventually pair with it. Just be aware that early release motherboards from the next few months are likely to use today’s SB600.


Four Cores, Seven Months Ago
From its early public appearances, enthusiasts have been clamoring for the promised benefits of AMD’s fully integrated quad-core design. To get all of the benefits, though, use an AM2+ motherboard.

PHENOM...FOR REAL

There’s no point in sugar-coating it. We know that, after all the hype and hoopla for AMD’s Barcelona launch in the server world, channel availability of the quad-core processors has been sparse at best. Given that the new Opteron and its desktop counterpart, Phenom, are essentially the same chip with some minor twists, one can’t help but wonder about availability of the consumer parts heading into the holiday season. As of this writing, just before the official Spider launch, we have yet to hear reports of stock piled on distributors’ shelves, ready to spill out into the channel. That said, we’re encouraged that so many Asian board partners are breaking NDA in order to promote their SKUs in advance of the CPU’s launch. This is something we normally don’t see in the face of poor CPU availability. Also, while AMD remains vague when confronted about allocation questions, the tone of replies from multiple inside sources is more upbeat than we heard at this point in Barcelona’s cycle.

“Phenom and Barcelona share a common underlying structure,” says AMD’s Gary Bixler. “So all of the work that went into enabling us to build and sell Barcelona affects and helps Phenom. All of that tuning, ramping, and work that went into bringing Barcelona to market only works in Phenom’s favor. From a channel availability standpoint this quarter, I think we’re going to have a lot of product to sell. I feel good about it, and I think the channel should too.”

Given that we’ve recently given Barcelona fairly deep technical examination (issue 71), we won’t rehash all of that again here with Phenom, but the basics are worth refreshing. Phenom is AMD’s first native quad-core desktop processor. Unlike the multi-chip module design employed in Intel’s Core 2 Quad, AMD integrates four cores into one die rather than two dual-core dies in one CPU package. You will hear AMD trumpet ad infinitum that this is a superior design that can deliver better performance with lower latencies. We’re not going to dredge up the benchmark battles here. As we saw in the Opteron vs. Xeon face-offs, each camp has its own advantages and wins its own races. It falls to resellers to pick the best chip for the job.

That disclaimer aside, yes, there’s no question that AMD has the simpler, more elegant design. Rather than have to make lengthy trips through the front-side bus to the northbridge and back in order to have one core query another, Phenom’s unified design allows all cores to communicate on-die through an arbitration layer. Intel works around this problem by piling ever more L2 cache onto each die so that fewer off-die queries are needed. Whereas a Core 2 Quad plants 4MB of L2 cache on each dual-core die (8MB total in the package), Phenom uses 512K of L2 cache on each core backed by 2MB of L3 cache shared across all four cores. This design has left some people wondering why AMD opted for a larger L3, which is intrinsically slower than L2, instead of eliminating the L3 and opting instead for a larger, shared L2.

How to Stand Alone
Historically, OEMs offered Quad FX boxes with both processors installed, but you may find success in only offering one Phenom FX chip, keeping the ASP lower and the upgrade path open.

“AMD, with having the memory controller on-die, is not cache-dependent,” explains AMD’s McNaughton. “Intel is 100% cache-dependent to offset the slow front-side bus and the latency that exists within their architecture. We do not need the cache to deliver the performance. L3 cache is really an extension of the northbridge per se. We don’t add more L2 cache because we don’t need it. It’s not going to reduce latency and increase performance. It would just sit there and cost money.”

Apparently, AMD must be onto something smart, because recent news has revealed that Intel’s next-generation Nehalem architecture for late 2008 will shift the 12MB of L2 cache in today’s just-launched Penryn chips into 8MB of L3 shared across all four unified cores, mirroring AMD’s move one year later.

Phenom’s embedded dual-channel memory controller will handle up to DDR2-1066, and the controller is smart enough to still optimize performance across mismatched modules. The Phenom line will transition to DDR3 support when the processor migrates from 65nm production to 45nm. (Obviously, moving to DDR3 will require a new processor whereas an Intel platform would require a motherboard change.) With Phenom, AMD also improved on Athlon’s physical memory address space. The platform now supports 48-bit addressing, allowing a maximum of 256TB of system memory. Admittedly, that’s a lot to squeeze out of four DIMM slots on a desktop board, but it’s all about scaling, right?

As we’ve seen with Barcelona, Phenom is backward compatible with prior-generation motherboards. A Phenom chip that uses the AM2+ socket will fit into and work with an AM2 motherboard. The thing to watch for here is not compatibility—-the parts mix and match just fine-—but full utilization of two key platform elements. The first is HyperTransport 3.0. With a raw bandwidth of 20.8 GBps in the pipeline, Phenom uses 16.0 GBps. (Presumably, this leaves a fair margin for overclockers to dabble with.) You can put a Phenom chip on a HyperTransport 1.0 or 2.0 bus, but the potential for bottlenecking increases accordingly.

Four Slots of Fury
CrossFire X allows for four AMD cards to combine and conquer virtually any graphical task. Note the twin data connectors on the card tops, a feature absent in NVIDIA’s dual-card SLI approach.

The second element is split plane power support, more officially known as Dual Dynamic Power Management. Up to now, one power plane would serve both the cores and memory controller within an AMD CPU. Split plane functionality allows Phenom to split that plane, sending one power link to the processor cores and the other to the memory controller. This way, the power levels for the cores and memory controller can be adjusted independently, giving a new level of possibility in the OverDrive settings. Not only can users tweak memory controller clock speeds and achieve lower memory latency, but they can also better adjust power consumption to fit their needs.

Admittedly, you probably won’t see many end-users wanting to get hands-on at this deep level, but some will. In any case, it’s a great bullet point with which to illustrate Phenom’s flexibility. Just be aware that, in order to get the benefits of split plane, Phenom buyers will need a motherboard that also supports the feature. This means boards with either an AM2+ or F+ socket.

Continuing in the power vein, don’t ignore Phenom’s other power-saving enhancements. AMD’s Independent Dynamic Core Technology is a lengthy name for the ability of each core to dynamically adjust its power usage based on its utilization level, which is an improvement from the traditional method of all cores chewing through power based on the level of the core with the highest utilization. CoolCore Technology works by powering down pieces within each core when they’re going unused. This could mean everything from large core areas to individual transistors. And there are other innovations designed to drop power consumption and protect the processor from overheating, both in spots as well as core-wide. The end result is summed up under the name Cool’n’Quiet 2.0. Barcelona made a lot of noise about fitting in the same thermal/power envelope as its preceding Opteron Revision F part. Phenom can’t quite make the same claim, but when you figure that AMD here fits four cores into 95W that it previously occupied with two cores at 89W, plus it introduces the many enhancements of CnQ2.0, we think Phenom’s energy argument is extremely persuasive.

On November 19th, AMD launched its first two Phenom SKUs, the 2.2 GHz Phenom 9500 and 2.3 GHz Phenom 9600. Both are 95W parts with 4x512K of L2 cache and 2MB of L3. Some were hoping for higher clock speeds at launch, but these parts beat early fears in the 2.0 GHz range, and early word on the Web is that Phenom offers tons of overclocking headroom. So combine this with OverDrive, and we’re not worried. By the time you read this, AMD will probably also have its Phenom 9700 part out, which will hit 2.4 GHz but in a 125W TDP. In part, this is because the 9700 will use a 2 GHz (each direction) HyperTransport connection rather than the 1.8 GHz link of the 9500 and 9600.

The Phenom 9000 family refers to AMD’s quad-core desktop series. In Q1 of 2008, look for the debut of the triple-core Phenom 8000 series. Yes, three seems odd, but the answer is simple enough. Just as we’ve seen Intel do with its own multi-core designs when there’s a value niche to fill, AMD here takes 9000-series Phenoms with a defect in one core, disables that core, and sends the chip out with three active cores. Given that the bulk of mainstream buyers are still more interested in the basic platform advantages of Spider than they are with whether their CPU has three cores or four, the 8000 family, if it reaches the right price points on arrival, could be a key channel play come springtime. Expect launch parts in the 2.3 to 2.5 GHz range with an 89W TDP, 2MB of L3 cache, and, of course, 3x512K of L2.

Wrapping around the calendar into 2008, look for Phenom FX to arrive. Just like the Athlon 64 FX, Phenom CPUs will be the highest-performance parts aimed at enthusiasts. First up will be the FX-80 series, which will run on the AM2+ socket and is thus part of the Spider platform. As of this writing, there’s still anticipation of an FX-90 series for Socket F+, but because this socket falls outside of Spider, no additional details are forthcoming from AMD, and we wouldn’t be surprised if the FX-90 parts never saw the light of day. Either way, just keep in mind that these are the CPUs you’ll be needing for your Quad FX clients in 2008.

Success With Less
MSI’s K9A2 Platinum is a 790FX-fueled board that delivers exceptional performance and functionality without the extremity of ASUS. This unit still features everything from eSATA to TOSLINK to quad CrossFire X.

RADEON GOES SERIOUSLY HD

You see the term “HD” tossed around Spider marketing like mad. Like your favorite expletive, it seems to apply in almost any context to mean just about anything. We normally think of HD in terms of video-—1920x1080 and all that. AMD takes a little looser interpretation. With Spider, HD means high resolution, the bandwidth to drive that resolution, and the technologies to make whatever is being displayed in that resolution richer. You could point to the wider pipelines in the 7 Series, Phenom’s quad-core architecture, and embedded HDCP circuitry in AMD’s IGPs as previously covered examples. But now we turn to the new ATI Radeon HD 3800 series of GPUs for an even clearer representation.

If we step back in time a few months, you’ll recall the arrival of the ATI Radeon HD 2000 series. (And in case you missed that, check out the RAM TV video on the subject at www.reselleradvocate.com/public/ram/rampage/rampageTV_hd2000.html.) More precisely, we had the arrival of the R600 graphics core, which debuted first in the 80nm HD 2900 XT before being followed by the 65nm HD 2600 and HD 2400 groups. Not coincidentally, the 65nm SKUs integrated features that were missing from the 80nm HD 2900s, most importantly UVD. (See below.) The reasoning seems clear enough: The larger fab size left less room for decode circuitry. Plus AMD was on a release schedule time crunch. Plus the R600 (HD 2900) taped out before the RV630 (HD 2400 and 2600). Add it up and you were left with a 2900 series targeted at cost-minded but high-end gamers rather than multimedia and home entertainments buffs. The HD 2900 was a compromise product that had to hit a deadline.

The processing architecture of the HD 3800 nearly mirrors that of the HD 2900. At its heart, the 3800 boasts 320 stream processing units, those flexible, programmable shader cores that do away with the old model of fixed vertex or pixel shaders. HD 3800s have 16 texture units, 16 render back-ends, and a programmable tessellator unit. ATI’s ring bus memory design carries forward, but note that whereas the HD 2900 Pro/XT had a 512-bit bus within each ring direction, the new HD 3800 parts offer only 256-bit bandwidth. This would seem to indicate the possibility of a higher-end part waiting in the wings for Q1. There should certainly be room to scale the core clock up. The HD 2900 XT ran at 743 MHz while the HD 3870 defaults to only 775 MHz. Given that the HD 3800 series makes the critical leap to 55nm fab production from the HD 2900’s 80nm, AMD should have plenty of room to ratchet upward. As of the November Spider launch, the raw performance of these two families is nearly identical, only the HD 3870 accomplishes in 192 square millimeters what the HD 2900 did in 408. Better yet, AMD is positioning the new parts at half the price of their high-end predecessors.

So now we have to ask: Is the HD 3800 a high-end part because it replaces the former ATI champ, or is it a midrange part because of the $150 to $250 price band? We’re not really sure. Regardless of why AMD is electing to avoid a direct confrontation with NVIDIA for possession of today’s benchmarking crown, we can’t argue with the resulting channel opportunity. In keeping with its new graphics policy of “overdeliver, underprice,” AMD’s HD 3870 (according to AMD’s own benchmarks) delivers roughly 5% to 35% better performance across a broad range of games than NVIDIA’s GeForce 8800 GTS—-and does it for an average of at least $50 less. We don’t know if the HD 3870 will benchmark higher than an 8800 GTX. We do know that, at over $500 a pop, channel resellers probably aren’t moving a lot of GTXs. AMD is supplying more persuasive value at the mid-level price points system builders can best capture.

Naturally, you don’t want to get into a benchmark battle discussion with your customers. That’s like reducing CPUs to megahertz. Once you reduce everything to a single metric, you’ve lost the chance to add value and sell an experience rather than a spec. On that note, it’s important to recognize some of the other value points that AMD embeds in its new GPUs.

Passive Aggression
Sapphire’s HD3850 card, with its 512MB of memory, mainstream GPU, and two-sided heatpiping, delivers all the performance and feature enhancements of the new HD 3800 generation at 0 dB.

DIRECTX 10.1

This may or may not be a notable point, depending on your audience. Briefly, Direct X is a collection of application programming interfaces (APIs) designed to process and accelerate multimedia tasks. As DirectX (and its rival API set, OpenGL) has evolved, computer-generated real-time worlds have grown ever more realistic. In order to realize the most benefit from the latest version of DirectX, the application, operating system, and display adapter all had to support that “DX” generation.

DirectX 10.1 is an incremental update to version 10.0. Version 10, in turn, was a significant jump from DirectX 9, which lacked the ability to support unified shaders. With 10.1, there are only a handful of updates. The most major is the implementation of cube mapping in order to enable global illumination. Global illumination refers to how light reflects from one object to another within a scene. Hold a red can of Coke up to a white sheet of paper and you’ll see how the light bouncing off the can starts to give the paper a diffused, red cast. Similarly, the side of the can closest to the paper may become slightly brighter from the bounced white light. Not only do scenes with such illumination look more authentic, the amount of bounced light we see influencing nearby objects is one way in which our brains intuitively gauge the distance between them. There are a number of techniques that can create global illumination, but until now they’ve been so compute intensive that they couldn’t be run in real-time. DirectX 10.1 remedies that through cube mapping.

Cube mapping works by chunking a scene down into an array of cubes. Imagine a bug stuck in the middle of an ice cube. The bug can see a simplified view of the environment around him through each of the six cube faces. In tech jargon, the bug’s perspective in the cube’s center is called a “light probe,” and the six views he sees through all six faces is a “cube map.” The system can then compute the amount and color of light hitting the cube from all directions. Figure in an entire scene full of cubes and you suddenly have the ability to determine how light should be represented as it bounces from one object to another, including in glossy reflections.

A couple of other changes are also worth noting. DirectX 10.1 updates the Shader Model support from 4.0 to 4.1. Previously, no multi-sample antialiasing was required; now at least 4X MSAA is required. The custom antialiasing filters introduced with the Radeon HD 2000 series can now be implemented as pixel shaders in cases where MSAA might be a liability. And so on. Users won’t be able to enjoy the benefits of global illumination and DX10.1’s other improvements until the Windows Vista SP1 release early in 2008, but with the right hardware in hand, at least they’ll be prepared.



...more
 
         
    Back to top
Page 1 2 3
   
   
Copyright © 2007 RAM Magazine. All rights reserved.
Do not duplicate or redistribute in any form.