Miquel Pericàs' Blog

Thursday, February 3, 2011

Xilinx acquires AutoESL

Xilinx has released a press release with some details on their recent acquisition of High-Level Synthesis vendor AutoESL. AutoESL's AutoPilot tool will now be intergrated into the Xilinx toolchain. And, you guessed right, there will no longer be an Altera back-end

This aquisition is a very interesting development, and one that is likely to have many consequences. First, there are currently many players in the c-to-gates world that have been competing for a slice of the FPGA design market (AutoESL, Pico -now Synopsis-, Harwest CE, Impulse-C, Catapult C, etc etc). With one of these tools now part of the standard Xilinx toolchain, one wonders what future is left for the other products within the Xilinx world. 2nd, with one c-to-gates already present in the standard Xilinx toolchain, this C-to-gates is likely to generate a standard coding style for C programs that are to be translated into Xilinx FPGA hardware. IMHO this would be good as it would mean increased compatibility and portability. And finally, particularly if included in the xilinx university program, it is likely to considerably increase the user base of c-to-gates tools.

As always, only time will tell, but it seems there will be some interesting movements in the future. It will also be interesting to see if Altera is going to follow a similar route now.

Friday, January 7, 2011

NVIDIA to follow the ARM path

During CES, NVIDIA announced that they plan on integrating their GPU cores with ARM cores to create a new family of heterogeneous devices, much in the line of AMD Fusion or Xilinx Extensible Processing Platform. The first such heterogeneous device will apparently be part of the 2013 'Maxwell' architecture. Since NVIDIA's Tegra line is already based on ARM cores, this now means that all of NVIDIA's chips will feature ARM general purpose cores. There is some more information in this HPCwire blog post.

Monday, November 1, 2010

Achronix-Intel Partnership for FPGA fabrication

While sitting on my couch on this Nov 1 morning, I was struck by this eetimes piece mentioning an agreement between Intel and Achronix by which Intel will fab Achronix' Speedster22i FPGAs in their upcoming 22nm node technology. Achronix, as you might now, develops an alternative "asynchronous" FPGA technology that allows it to reach very high clock frequencies. I had lost track of this startup for the last months, but this agreement suggests to take another look into their technology.

Now, while the deal is great for Achronix, what it means for Intel is even more intriguing. Intel has never fab'ed for a third party vendor (AFAIK). As most comments on the eetimes article suggest, it may be the case that Intel has plans to acquire Achronix and start its own FPGA business based on this technology. Having Intel enter the FPGA business would certainly give Xilinx and Altera execs some headaches to keep their market share undamaged. But why would Intel want to enter this business? Maybe the key lies in the Intel Atom chip. A reconfigurable system on chip device based on the Atom technology might be a highly desirable device, similar to Xilinx' extensible processing platform based on hard ARM cores.

Sunday, September 26, 2010

The future of the low-cost GPU landscape

I came across this article in something called the International Science and Grid This Week (iSGTW) where the author speculates that the introduction of chips such as Sandy Bridge (Intel) and Llano (AMD) in mid 2011 may mean the end to cheap supercomputing based on GPUs. It is an interesting point of view but I do not completely agree. The point is that these chips will eat a large piece out of the low end GPU market that NVIDIA currently holds. By reducing the revenues of this segment the higher end CUDA devices will increse in cost (the author speculates with a 10x figure based on a QS22 to PS3 price comparison). Thus, he concludes, the era of cheap GPU-based supercomputing is coming to an end.

Even if this were to happen, certainly the performance difference between the lower-end and higher-end GPU devices is probably much smaller than 10x. Nothing precludes institutions from building GPU-based clusters from smaller and cheaper devices (ie, same philosophy as Blue Gene). Thus it seems to me the price tag would still not be paid. On the other hand, in this situation we might see clusters from Sandy Bridges and Fusion chips, which would fall in this very same category. It will be interesting to observe how this proceeds in the future.

Monday, September 13, 2010

Towards an International EDA Roadmap?

Just wanted to share this piece on the necessity for different players of the EDA market to start thiking more seriously on tool interoperability. As tool complexity and design complexity increases, this seems to be a natural direction.

Tuesday, July 13, 2010

GRAPE-DR ranks 1st on Little Green500

It has been a while since I last heard from the GRAPE-DR project. This week it was finally announced that a recently installed GRAPE-DR system was holding the top spot on the Little Green 500 list, boosting a remarkable 815MFLOPS per Watt. GRAPE-DR is a generalization of the GRAPE systems professor Jun Makino had been working on for astrophysical simulations. A paper on GRAPE-DR had been presented at Supercomputing'07. I still remember reading it back then. The new system has been designed in collaboration with professor Kei Hiraki from 東大 (Tokyo U.) whom I had the pleasure to meet at ISCA'0'4 in Munich. Glad to see the system has been finally built. Although the current system is by no way the 2PFLOPS machine that the authors intended to build by 2008, 23 TFOPS for a 64-node machine is still a quite remarkable feat. Every node has one accelerator board that holds 4 GRAPE-DR chips, each doing 200GFLOPS double precision in just 50 Watt. More information can be found here.

Friday, July 2, 2010

GPU vs CPU: Intel debunks Nvidia?

It is quite uncommon to find an academic paper in a major computer architecture conference to attract so much attention. But it is apparent that Intel's paper comparing (or, "debunking") GPU performance against CPU (in terms of speed-ups) has caused quite some movement. Several forums including Linux Magazine, Beyond 3d and even nvidia's forums have reported on this paper.

While GPUs often report extraordinary peak performance based on FLOPS rate, many applications are actually mainly limited by memory bandwidth. If you compare the available memory bandwidhts of both CPUs and GPUs it makes sense that reported speed-ups are in the vecinity of 1-10x. After all, adding more compute units to increse theoretical peak FLOPS rate (as Moore's Law permits) is apparently simpler than increasing the memory bandwidth. There are reports already claiming 128GFLOPS for a Intel Sandy Bridge (SB) Processor with 4 cores. High throughput for such a processor can only be sustained if the application offers enough data reuse in the caches (same for GPU). No reuse at all would require 2TB/s (!) input bandwidth to keep all these SB units busy. Expect to see future system's performance to be increasingly dominated by external bandwidths.