Following today’s early-on coverage of the Day 2 coverage of Qualcomm’s Tech Summit event in Maui, Hawaii, we recap the major story of the day: The new Snapdragon 855 platform. The new platform follows this year’s extremely successful Snapdragon 845 SoC, which we saw power pretty much the vast majority of 2018’s flagship devices.
Qualcomm isn’t standing still, and the Snapdragon 855 represents a new generation, bringing a refresh of the SoC IPs as well a brand new 7nm manufacturing process. Let’s dwell more into today’s details and analyse how the new SoC platform will raise the bar for 2019.
The Finer Details
|Qualcomm Snapdragon Flagship SoCs 2018-2019|
|SoC||Snapdragon 855||Snapdragon 845|
|CPU||1x Kryo 485 Gold (A76 derivative)
@ 2.84GHz 1×512??KB pL2
3x Kryo 485 Gold (A76 derivative)
4x Kryo 485 Silver (A55 derivative)
4?? MB sL3
|4x Kryo 385 Gold (A75 derivative)
@ 2.8GHz 4x256KB pL2
4x Kryo 385 Silver (A55 derivative)
|GPU||Adreno 640 @ ?MHz||Adreno 630 @ 710MHz|
|Memory||4x 16-bit CH @ 2133MHz
5MB system level cache
|4x 16-bit CH @ 1866MHz
3MB system level cache
|ISP/Camera||Dual 14-bit Spectra 380 ISP
1x 48MP or 2x 22MP
|Dual 14-bit Spectra 280 ISP
1x 32MP or 2x 16MP
|2160p60 10-bit H.265
HDR10, HDR10+, HLG
|2160p60 10-bit H.265
|Integrated Modem||Snapdragon X24 LTE
DL = 2000Mbps
UL = 316Mbps
|Snapdragon X20 LTE
DL = 1200Mbps
UL = 150Mbps
|Mfc. Process||7nm (N7)||10nm LPP|
At the heart of the new Snapdragon 855 lies Arm’s new Cortex-A76 CPUs: We’ve covered the new microarchitecture extensively this year as we dove into the technical tidbits of the CPU in May, and more recently have been able to deep dive test the performance and power efficiency of the new IP inside HiSilicon’s new Kirin 980. The combination of the new Cortex-A76 with the new 7nm manufacturing node made for great leaps in performance and power efficiency, something that bodes very well for the new Snapdragon 855.
Qualcomm’s take on implementing the new Cortex-A76 cores is quite a bit different than what we’ve seen from HiSilicon. Overall there’s still 4 Cortex A76 derived cores (Kryo 485 Gold as Qualcomm markets them), alongside four Cortex-A55 derived CPUs. The differences here lie in the frequencies, the apparent cache configurations, as well as apparent changes in some microarchitectural tuneables.
Interestingly, for the first time since Qualcomm has adopted Arm’s “Built on Cortex Technology” license and with the third iteration of its implementation, Qualcomm has finally released details on the kind of changes that have been commissioned to Arm in terms of changes to the IP. Here Qualcomm reveals that the Cortex-A76 variation in the Snapdragon 855 allows for a bigger out-of-order execution window, most likely referring to an increase in the size of the reorder buffer. The stock A76 has a 128 instruction buffer, whereas Qualcomm's modified A76 has been increased to an undisclosed size.
Alongside what seems to be the ROB increase, Qualcomm has also revealed that the data prefetchers have been optimised for better efficiency. It’s not clear if the “efficiency” here refers to power efficiency or the efficiency in the way data is prefetched, nor are the disclosures here what exactly has changes, whether there’s more or less prefetch streams or if there’s been changes in the other types of prefetchers.
While HiSilicon opted for a 2+2 design, where one pair of A76’s were optimised for high frequencies and the second pair were optimised for higher power efficiency, Qualcomm opted to go with a 1+3 configuration.
The highest performance core, “Kryo 485 Gold Prime” as Qualcomm calls it, is clocked in at 2.84GHz – putting it on its own clock domain – and is seemingly configured with a 512KB L2 cache. The other three cores are clocked at 2.42GHz and retain smaller 256KB L2 caches. This configuration is quite odd – you also would expect Qualcomm to take advantage of the new DynamIQ cluster design, which is able to support different frequency and voltage planes, however things get even odder. The prime core actually doesn’t have its own power plane, and thus it has to share its power plane with the other three big cores.
This revelation of the prime core not having its own power domain is quite shocking and it invalidates a lot of the benefits of actually having a separate clock plane for a core. In effect the real-world benefit here isn’t any different than simply clock-gating the core.
It is true that there’s a large amount of scenarios where there’s predominantly a single larger thread active, this is particularly true in web browsing workloads. Such a 1+3 configuration would achieve better performance and possible better efficiency than a 2+2 configuration, but because the cores aren’t running on separate voltage planes it means the actual benefits here in real-world applications are just going to be quite minor. The net result is that when the prime core gets powered up, the other big cores have to follow, regardless of whether there's any work for them to do.
Qualcomm’s 2.84GHz clock is 9.2% higher than HiSilicon 2.6GHz frequency. A big question here is just how far Qualcomm has driven the core up on the power curve – I am expecting it to be less efficient than the Kirin 980 by some margin, how big that margin will be is something we won’t see until we get our hands on commercial devices.
Most interestingly for today’s presentation is that Qualcomm hadn’t made a single concrete mention about CPU power efficiency of the Snapdragon 855, and I’m not sure if this means there’s no improvements or rather just downplaying this aspect of the SoC given the other significant changes.
Lastly, I do find it odd that Qualcomm went for smaller L2 caches on the remaining 3 high performance cores. I still expect these to end up higher performance than HiSilicon’s 1.92GHz A76 units with 512KB L2’s – but it’s nevertheless interesting to see both companies try to achieve the same goal in different ways.
Moving on, we see the four Cortex-A55 derived efficiency cores, which are running at 1.8GHz and coupled with 128KB L2 caches. In this regard, it seems the Snapdragon 855 doesn’t differ from the Snapdragon 845. Here the company has seemingly put all the process node advantages into improving power efficiency of the little cores.
The DynamiQ Shared Unit’s L3 cache should come in at 4MB – which would be a doubling over the 2MB configuration on the Snapdragon 845. It’s to be noted that we haven’t yet fully confirmed the cache configurations at the time of writing, but I’m strongly leaning towards these figures to be correct.
In terms of performance, all that Qualcomm publishes is a claim of up to a 45% performance increase over the Snapdragon 845. As with last year, it’s a bit of a mystery exactly what this figure represents, but the number pretty much falls in line exactly where the Kirin 980 performs in relation to the Snapdragon 845 in SPEC2006. The big question for the S855 is how the new generation system level cache will behave in terms of memory latency, as this will be among the biggest aspects differentiating Qualcomm’s new SoC from its Kirin competition.
Another interesting performance comparison that was published today is a showcase of performance figures between the Snapdragon 855, Apple A12, and the Kirin 980 in terms of app launch times. Though Qualcomm doesn't directly name their competitors, competitors A and B should be the Apple A12 and the Kirin 980 respectively, assuming Qualcomm’s colour scheme is also consistent across the GPU comparisons. For me it’s not to surprising to see the Snpadragon 855 perform this well – one thing I did note in my Huawei Mate 20 review is that the Pixel 3 and OnePlus 3 still felt faster in terms of application launch times. Though this could all just be a side-effect of the scheduler and framework of the Snapdragon chipset rather than the raw CPU performance of the hardware. Of course, software still matters immensely and over the last two years Qualcomm has demonstrated absolute leadership in terms of milking out responsiveness and reactivity out of the hardware through its software designs.
Adreno 640 GPU – Iterative Features and Performance
The Adreno 640 graphics block will be the focus for Qualcomm’s gaming efforts. The company went to great lengths to detail how they felt mobile gaming is on the rise, while other platforms for video games are either stagnating or in decline.
In terms of technical specifications, as is traditional with Qualcomm, we didn’t see much in the way of detailed disclosures on the new GPU. What we did get are more conservative figures, such as a 20% increase in performance. This increase is quite small compared to what we tend to usually see, especially given the fact that the Snapdragon 855 is able to take advantage of a major process node transition.
The Snapdragon 845’s GPU was already the smallest among flagship mobile SoCs at a mere 10.69mm², so unless Qualcomm has significantly increased the number of processing elements inside the GPU cores, this generation should be even smaller. Meanwhile in the event presentation there was one actual titbit about the GPU; Qualcomm is saying that they've increased the number of ALUs for FP32 and FP16 operations by 50%. If my previous estimates about the Adreno 630 were correct, then this would mean the new Adreno 640 sports 192 ALUs per core for a total of 384 ALUs. This ALU increase doesn’t match up with the claimed performance increase, so it’s possible Qualcomm is running the GPU at a lower frequency, or the performance claims were made in regards to possibly less ALU sensitive workloads.
While the performance gains remain a bit vague at time of writing, Qualcomm did disclose a lot in terms of new graphical features. Here we saw claims that the Adreno 640 graphics in the S855 will enable true HDR gaming, as well games built around Physically Based Rendering. The graphics pipeline will support 10-bit color depth and the Rec 2020 gamut to enable HDR, as well as enabling S855 devices to support the HDR10+ and Dolby Vision formats, which QC states is a world’s first. With the Adreno 640, along with the display IP, devices can support 120fps gaming as well as smooth 8K 360-degree video playback (resolving a major complaint about Snapdragon-power). Just don’t ask how much space those 8K 360-degree videos take up.
Qualcomm's support for Physically Based Rendering in graphics is an interesting topic, one we’ll go into detail in a different article, but the concept is not new. In fact we're a bit surprised to see it mentioned in the same breath as actual hardware changes, since conceptually it shouldn't require any new hardware; PBR is just a shader program that all of the Adreno 600 family should be able to run.
In any case, the short version is that with this enabled, it will help add realism to gaming and augmented reality through more accurate lighting physics and material interactions. Qualcomm stated that through the Unity 4 engine, developers will be able to use real world materials designed from scientific values created by companies like Quixels and Allegorithmic that will make their environments more lifelike, such as the correct surface roughness / audio reflections or material-on-material interactions. This will also help with lighting and depth perception. More details to come.
The new Hexagon 690 now plays an even bigger role
The new Hexagon 690 is both an iteration on past generation units, as well as bringing a totally new aspect to the IP block. The new DSP block doubles up on the number of HVX vector pipelines as compared to the Hexagon 680 and 685 (as a reminder, the 680/685 contained four scalar units along with two 1024b vector processing units). On the scalar pipeline performance, the new unit sees a 20% increase in performance, which might be due to microarchitectural changes or simply a higher clock speed. On the vector side, he new Hexagon now contains four 1024b vector pipelines, doubling the processing power for those units.
What is entirely new for the new Hexagon IP is the introduction of a new dedicated “tensor accelerator” unit that is seemingly tightly coupled within the larger block. This would most likely be a generally more simplistic MAC array resembling that of other neural network inferencing engines from other vendors. Effectively, this thus represents Qualcomm’s first generation dedicated ML inferencing engine, although Qualcomm does market it as a 4th generation “AI engine” (Previously this nomenclature referred to ML inferencing on CPU, GPU and DSP).
One of the interesting limitations that Qualcomm has wrangled with in previous architectures was that DSP couldn’t process traditional DSP workloads alongside ML inferencing work. However for the S855, Qualcomm has confirmed that the new Tensor units are able to work concurrently and independently alongside the vector units. This means that traditional image processing tasks can now be done in parallel to inferencing tasks, vastly increasing the real-world usability of inferencing on the new Snapdragon platform.
Alongside the vector and tensor units is a new Voice Assistant AI module, designed to accelerate processing for common AI assistants – Qualcomm specifically states Baidu and Google in this case, although it’s unlikely to be exclusive to these two.
New Spectra 380 ISP brings a new Computer Vision accelerator
Alongside the AI Engine, Qualcomm is pairing it with an upgraded Spectra 380 image signalling processor. It also gets a boost to its processing power with a tri-core Computer Vision (CV) ISP. This might seem a little odd – Computer Vision is a family of algorithms that we used to have before the latest advent of machine learning acceleration, so why is it here?
Computer Vision IP is still very useful for various tasks, and in comparison to ML, it's computationally a lot lighter. As such, Qualcomm has deepened its ISP acceleration. With the CV-ISP, Qualcomm believes it can save 4x power (75%?) for image object classification, object segmentation, depth sensing (at 60 FPS), augmented reality body tracking, and image stabilisation.
It’s to be noted that Qualcomm’s claim about being the first computer vision ISP is a bit confusing – it’s definitely not the first SoC with a computer vision IP block, as at least Samsung had this two generations ago. Maybe Qualcomm is referencing some better integration within the ISP pipeline?
With the new accelerator, Qualcomm states that rather than using the AI Engine for everything, the CV-ISP is preferred for these tasks due to the power savings. The company expects many of the features we normally take for granted in photography will now be available for video as well, such as 4K HDR at 60 FPS with both portrait mode enhancements and depth sensing for live bokeh. Power consumption for 4K HDR capture will be 30% lower than the Snapdragon 845.
Also in the cards is support for new generations of image formats. Currently JPEG rules the roost in smartphone photography, however the new S855 will support ‘high efficiency image files’, or HEIF, which will offer a large number of new features over JPEG:
- Regular Images Up to 50% Smaller
- HDR Support for multiple color spaces
- Computer Vision data support
- RAW data
- Depth Map support
- Alpha Masking data
- Burst Photography data
- Animated Photography data
- HEVC Video data
The idea here is to enable new use cases for the photos that users take every day. Within the same HEIF, it can store a photo from every camera: a telephoto, a wide angle photo, and a super-wide angle photo for example. This allows of post processing such as bokeh adjustment, object segmentation, quality adjustments, and to a certain extent even basic 3D modelling.
Snapdragon 855 Connectivity
With day one being about Qualcomm’s approach to 5G, connectivity is going to be at the heart of the S855. Built into the chip is the Snapdragon X24 modem, which offers LTE Category 20 class download speeds good for up to 2 Gbps on supported devices and networks. The increased speeds are achieved by using up to 7x carrier aggregation with 4×4 MIMO and 256QAM. On the upload side, the new SoC modem now also for the first time supports 256QAM, able to double the peak upload rate to up to 300Mbps with 3xCA.
Meanwhile when it comes to 5G, for the S855 platform it’s not the actual SoC which delivers this, but rather the external Snapdragon X50 modem. When combined with the S855, the platform will support both mmWave (>24 GHz) and sub-6 GHz 5G NR technologies. Qualcomm made mention of its entrance in the RFFE (RF front-end) field, which the company now includes in its QTM052 RFFE/antenna mini modules.
Not to be left behind, Qualcomm has also improved their local wireless networking capabilities as well. Snapdragon 855 support of the new WiFi 6 standard (802.11ax) as well as the new 60GHz “Wi-Gig” 802.11ay standard, enabled by an external WiFi combo chipset.
Overall the new Snapdragon 855 promises some great improvements in pretty much every single aspect of the SoC. The biggest generational improvements seem to be on the CPU side, where Qualcomm promises a 45% jump in performance. The figure is pretty much in line with what we’ve seen on the Kirin 980, and that SoC performed excellently. And I’m certain Qualcomm will be able to squeeze out even more performance through software.
One big uncertaintly that I have is in regards to the CPU configuration. Here at first I thought the 1+3 configuration seemed reasonable, however the disclosure that the different cores are running on the same power/voltage plane has given me some doubts about how this will pan out in terms of power efficiency. Power efficiency of the CPU and the SoC in general use-cases seems to be something that Qualcomm hasn’t talked about at all today and this worries me quite a bit.
On the GPU side, as expected the new Adreno 640 is just an iterative design on the Adreno 630 – increasing the number of ALUs versus its predecessor by 50%. The performance gains were quoted at 20%, which are a bit on the conservative side but not something to be particularly worried about. Power efficiency is claimed to still be leading edge, and at least based on the sustained performance metrics showed it looks like the Snapdragon 855 will continue to lead in this regard.
The Snapdragon 855 is Qualcomm’s first chipset to include a dedicated neural network inference engine / “NPU”. Here Qualcomm claims to have a lead over its nearest competitor, the Kirin 980, however it’s something that will need a thorough investigation in the future, as we’ve seen that both vendors' software stacks aren’t quite mature enough to be able to fully extract the capabilities of the hardware. Still, this is an important step for Qualcomm as it now allows concurrent usage of the DSP for image processing while having a dedicated hardware block for inferencing tasks.
The ISP and the new computer vision hardware seem fantastic, the new features are really raising the bar in what will be possible with computational photography as well as introduce new aspects such as computational videography. Here we’ll have to see what OEM vendors will be able to actually implement on final commercial devices.
Last but not least, the Snapdragon 855 is the first platform that promises to provide a complete solution of next generation connectivity technologies. Here the features aren’t directly featured into the SoC but rather in accompanying chipsets, so I do expect that there’s going to be many devices that will not chose the full set of next-generation solutions, but also we might see some devices which will implement the new X50 5G modem alongside the new WiFi6/.11ax and 11ay WiFi solutions.
Today’s announcement is just the first wave of information on the Snapdragon 855. To really judge the new chipset, we’re eagerly waiting on Qualcomm to allow us to do a performance preview on the reference designs. Most importantly, we’re still a few months away from the first commercial devices – which will represent the definitive evaluation of the new chipset.