图形处理器成了计算领域的必备品,Nvidia(英伟达)因此也正在加强与标准和开源社区的合作,以实现将曾经主要是该公司开发工具的独有技术推向下游。
A lot of work is being done specifically around programming languages like C++ and Fortran, which are deemed to lag on native implementation to execute code across highly parallel systems.
很多工作都在围绕C++和Fortran等编程语言进行,这些语言在高度并行系统中的执行代码仍被认为滞后于原生代码实现。
The plan is to make generic computing environments and compilers more productive and approachable, Timothy Costa, group product manager for high-performance computing and quantum computing at Nvidia, told _The Register_.
Nvidia高性能计算和量子计算集团产品经理Timothy Costa告诉记者,该计划是为了令通用计算环境和编译器更有成效和更易于使用。
"Ultimately, our goal with the open source community and programming is to enhance concurrency and parallelism for all. I say that because I do mean CPUs and GPUs," Costa said.
Costa表示,“我们开源社区和编程的最终目标是增强各种并发性和并行性。我这么说是因为我确实是指CPU和GPU。”
Many of the technologies being opened up and brought mainstream are related to the past work done by Nvidia in its CUDA parallel programming framework, which combines open and proprietary libraries.
许多正在开放及成为主流的技术都与Nvidia过去在旗下CUDA并行编程框架中所做的工作有关,CUDA并行编程框架结合了开放库和专有库。
CUDA was introduced in 2007 as a set of programming tools and frameworks for coders to write programs to GPUs. But the CUDA strategy changed as GPU usage expanded to more applications and sectors.
CUDA是2007年推出的一套编程工具和框架,是供编码人员编写GPU程序用的。但随着GPU的使用扩展到更多的应用和领域,CUDA的策略也发生了变化。
Nvidia is largely known as dominating the GPU market, but CUDA is at the center of the company repositioning itself as a software and services provider chasing a trillion market valuation.
Nvidia主导GPU市场是大家都知道的,但该公司现在将自己重新定位为瞄准一万亿美元市场估值的软件和服务提供商,CUDA则是核心。
The long-term goal is for Nvidia to be a full-stack provider targeting specialized domains that include autonomous driving, quantum computing, health care, robotics, cybersecurity, and quantum computing.
Nvidia的长期目标是成为一个全栈供应商,针对的专业领域包括自动驾驶、量子计算、医疗保健、机器人、网络安全和量子计算。
Nvidia has built CUDA libraries specialized in those areas, and also provides the hardware and services that companies can tap into.
Nvidia已经建立了专门用于这些领域的CUDA库,并且还提供企业可以利用的硬件和服务。
The full-stack strategy is best illustrated by the concept of an "AI factory" introduced by CEO Jensen Huang at the recent GPU Technology Conference. The concept is that customers can drop applications in Nvidia's mega datacenters, with the output being a customized AI model that meets specific sector or application requirements.
Nvidia首席执行官黄仁勋在最近的GPU技术大会上提出的“人工智能工厂”的概念最能说明全栈战略。人工智能工厂概念指的是,客户可以将应用程序放到Nvidia的超大型数据中心里,输出的就是满足特定行业或应用要求的定制人工智能模型。
Nvidia has two ways to earn money via concepts like the AI factory: through the utilization of GPU capacity or usage of domain-specific CUDA libraries. Programmers can use open-source parallel programming frameworks that include OpenCL on Nvidia's GPUs. But for those willing to invest, CUDA will provide that extra last-mile boost as it is tuned to work closely with the Nvidia's GPU.
Nvidia可以用个两种方式通过人工智能工厂等概念赚钱:通过GPU容量的使用或是特定领域CUDA库的使用。程序员可以使用开源的并行编程框架,其中包括Nvidia GPU上的OpenCL。但对于那些愿意花钱的人来说,CUDA将提供额外的最后一英里提升,因为这一部分经过微调后可以更紧密地与Nvidia的GPU共同工作。
### Parallel for all
全部并行
While parallel programming is widespread in HPC, Nvidia's goal is to standardize it in mainstream computing. The company is helping the community standardize best-in-class tools to write parallel code that is portable across hardware platforms, independent of brand, accelerator type or parallel programming framework.
并行编程在高性能计算中很普遍,而Nvidia的目标则是在主流计算中令并行编程标准化。该公司目前正在帮助社区实现最佳工具的标准化,用于编写可移植到跨硬件平台的并行代码,不受品牌、加速器类型或并行编程框架的限制。
"The complication is – it may be measured as simply as lines of code. If you are, if you're bouncing back and forth between many different programming models, you're going to have more lines of code," Costa said.
Costa表示,“有点复杂的地方是,有可能简单地用代码行数来衡量。如果在许多不同的编程模型之间来回倒腾,你的代码行数会更多些。”
For one, Nvidia is involved in a C++ committee that is laying down the piping that orchestrates parallel execution of code that is portable across hardware. A context might be a CPU thread doing mainly IO, or a CPU or GPU thread doing intensive computation. Nvidia is specifically active in bringing standard vocabulary and framework for asynchrony and parallelism that C++ programmers are demanding.
首先,Nvidia参与了一个C++委员会,该委员会正在讨论以协调跨硬件可移植代码并行执行的有关框架。一个可能的用例是一个主要进行IO的CPU线程或进行密集计算的CPU或GPU线程。C++程序员非常希望用异步和并行,Nvidia在推广和普及这方面的标准词汇和框架方面特别活跃。
"Every institution, every major player, has a C++ and Fortran compiler, so it'd be crazy not to. As the language is advanced, we arrive at somewhere where we have true open standards with performance portability across platforms," Costa said.
Costa表示,“每个机构、每个主要参与者都有一个C++和Fortran编译器,所以不这样做就会疯掉的。随着语言的发展,我们到了一个地步,要有真正的开放标准,具有跨平台性能可移植性。”
"Then users are of course always able, if they want to, to optimize with a vendor-specific programming model that's tied to the hardware. I think we were arriving at kind of a mecca here of productivity for end users and developers," Costa said.
Costa称,“然后,如果用户想优化的话,他们当然总是能够针对某个绑定在供应商硬件的特定编程模型进行优化。我认为对于终端用户和开发者而言,我们进入了生产力的乐园。”
Standardizing at a language level will make parallel programming more approachable to coders, which could ultimately also boost the adoption open-source parallel programming frameworks like OpenCL, he opined.
他认为,语言层面的标准化将使得编码者更容易使用并行编程,最终也会促进OpenCL等开源并行编程框架的采用。
Of course, Nvidia's own compiler will extract best performance and value on its GPUs, but it is important to remove the hoops to bring parallelism to language standards, regardless of platform, Costa said.
Costa表示,当然,Nvidia自己的编译器可以在自己的 GPU上提取最佳的性能和价值,但重要的是要消除障碍,要将并行性纳入语言标准,无论是用什么平台。
"Focusing on the language standards is how we make sure we have true breadth of compilers and platform support for performance model programming," he explained, adding that Nvidia has worked with the community for more than a decade to make low-level changes of languages for parallelism.
他解释说,“我们关注语言标准,就是要确保真正广泛的编译器和平台可以为性能模型编程的提供支持。”他还补充表示,Nvidia已经与社区合作了十多年,做了改变编程语言的低层次及实现并行性等工作。
The initial work was around the memory model, which was included in C++ 11, but needed to be advanced back when parallelism and concurrency started taking hold. The memory model in C++ 11 focused on concurrent execution across multicore chips, but lacked the hooks for parallel programming.
最初的工作围绕内存模型进行,C++ 11含内存模型,但是当并行性和并发性开始流行时,就需要改进内存模型。C++ 11中的内存模型专注于跨多核芯片的并发执行,但缺乏并行编程的钩子。
The C++ 17 standard introduced the groundwork for higher-level parallelism features, but true portability coming in future standards. The current standard is C++ 20, with C++ 23 coming up.
C++ 17标准引入了更高级别的并行功能基础,但真正的可移植性将在未来的标准中出现。目前的标准是C++ 20,C++ 23即将出台。
"The great thing now is because that piping has been laid, if you start looking at the next iterations of the standard, you'll see more and more user facing and productive features that are going into these languages, which are truly performance portable. Any hardware architecture in the CPU and GPU space will be able to leverage these," Costa promised. ®
Costa表示,“最棒的是框架已经定好了,在标准的下一次迭代里,大家会看到越来越多的面向用户的和成效方面的功能正在走进这些语言,是真正的性能移植。CPU和GPU领域的任何硬件架构都能够利用这些功能。”