Last night, ARM officially unveiled its Cortex-A78 and Cortex-X1 architectures, with the former slated for large cores and the latter for very large cores.
According to officials, the Cortex-A78 can improve the performance of the 20% while reducing the power consumption of the 50% during the 5-nanometer process.
But in fact, since this architecture is the same Austin microarchitecture as the A76/A77, the A78's IPC performance is actually only 71 TP3T higher, power consumption is 41 TP3T lower, cores are 51 TP3T smaller, and the cluster area of the quad cores is 151 TP3T smaller.
The 20% performance and 50% power consumption mentioned above is more of a frequency advantage and process advantage.
According to official data, the 7 nm process Cortex-A77 can reach 2.6 GHz while the 5 nm process A78 can reach 3 GHz, a sustained performance increase of about 20% at the same power consumption.
The Cortex-A77 power consumption of the Cortex-A78 core made in a 5 nm process with a Cortex-A77 clocked at 2.1 GHz is reduced by 50% compared to the Cortex-A78 core made in a 7 nm process.
Are the new big core systems a bit disappointing in terms of increased performance? Don't worry, because ARM has also released a new oversized Cortex-X1 core.
The X-cortex-X series is ARM's new high-performance core architecture, with the first product being the Cortex-X1, which improves performance by 301 TP3T compared to the A77, 221 TP3T compared to the A78, and 1,001 TP3T compared to the machine learning capabilities.
Additionally, the Cortex-X1 allows for customization to create more varied functionality, but this requires customer involvement early in the development process.
Compared to the A78, the Cortex-X1 increases the decoding bandwidth from 4 to 5 channels by 251 TP3T, and the NEON floating point is increased from 2 bars of 128 b to 4 bars of 128 b, which corresponds to a doubling of the floating point performance.
In terms of cache, the Cortex-X1 has 64 KB of L1 cache, 1 MB of L2, and 8 MB of L3, which is twice as much as the Cortex-A78.
For core pairing, Cortex-X1 can be combined with Cortex-A78 and Cortex-A55 to form a triple-core series architecture, i.e., X1 series, A78 series, and quad-core series, taking into account both mega-cores and small cores.
The architecture can be matched with ARM's official documentation of 1 MBL2, 8 MBL3 to form a set of DynamIQ clusters. In case of 4 Cortex-A78, paired with 4 MBL3 cache, the performance is increased by 201 TP3T over the previous generation, while the core area is reduced by 151 TP3T. and in case of Cortex-X1,3Cortex-A78 paired with 8 MBL3 cache, although the core area is increased by 151 TP3T, the maximum performance is increased by 30%.
In short, Cortex-X1 offers a significant performance increase, but it does have a much larger core area, which means more transistors, higher cost and higher power consumption, depending on the trade-offs made by the solution vendor.








