Limpar
7.050 resultados

Acesso aberto

Tipo do recurso

Ano de criação

Produção nacional

Revisado por pares

Áreas

Idioma

Editores

Capítulo de livro Acesso aberto Revisado por pares

Muthu Manikandan Baskaran, J. Ramanujam, P. Sadayappan,

Graphics Processing Units (GPUs) offer tremendous computational power. CUDA (Compute Unified Device Architecture) provides a multi-threaded ... parallel view make manual development of high-performance CUDA code rather complicated. Hence the automatic transformation of sequential input programs into efficient parallel CUDA programs is of considerable interest. This paper describes an automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular ( ... optimization practically effective, we develop a C-to-CUDA transformation system that generates two-level parallel CUDA ...

Tópico(s): Real-Time Systems Scheduling

2010 - Springer Science+Business Media | Lecture notes in computer science

Artigo Revisado por pares

M J Harvey, Gianni De Fabritiis,

... The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by ... Swan" for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% ... portable GPU applications but that the more mature CUDA tools continue to provide best performance. Program title: ...

Tópico(s): Software Testing and Debugging Techniques

2011 - Elsevier BV | Computer Physics Communications

Artigo Revisado por pares

Yukihiro Komura, Yutaka Okabe,

We present new versions of sample CUDA programs for the GPU computing of the Swendsen–Wang multi-cluster spin flip algorithm. In this update, we add the method of ... 26316 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: No limits (tested on ... multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on ... for high-precision Monte Carlo simulations. In the CUDA, the cuRAND library [2], which focuses on the ...

Tópico(s): Random Matrices and Applications

2015 - Elsevier BV | Computer Physics Communications

Artigo Revisado por pares

Emanuele Manca, Andrea Manconi, Alessandro Orro, Giuliano Armano, Luciano Milanesi,

... the GPU‐quicksort, a compute‐unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA‐quicksort an iterative GPU‐based implementation of the sorting algorithm. CUDA‐quicksort has been designed starting from GPU‐quicksort. ... performed on six sorting benchmark distributions show that CUDA‐quicksort is up to four times faster than ... An in‐depth analysis of the performance between CUDA‐quicksort and GPU‐quicksort shows that the main ...

Tópico(s): Advanced Data Storage Technologies

2015 - Wiley | Concurrency and Computation Practice and Experience

Artigo Acesso aberto Brasil Produção Nacional Revisado por pares

Vladimir Lončar, Luis E. Young-S., Srdjan Škrbić, Paulsamy Muruganandam, Sadhan K. Adhikari, Antun Balaž,

... new versions of the previously published C and CUDA programs for solving the dipolar Gross–Pitaevskii equation ... on distributed-memory systems. Finally, previous three-dimensional CUDA-parallelized programs are further parallelized using MPI, similarly ... comparison with the previous sequential C and parallel CUDA programs. The improvements to the sequential version yield ... on a computer cluster with 32 nodes used. CUDA/MPI version shows a speedup of 9–10 ... with 32 nodes. Program Title: DBEC-GP-OMP-CUDA-MPI: (1) DBEC-GP-OMP package: (i) imag1dX- ...

Tópico(s): Cold Atom Physics and Bose-Einstein Condensates

2016 - Elsevier BV | Computer Physics Communications

Artigo Revisado por pares

Mubeen Ghafoor, Shahzaib Iqbal, Syed Ali Tariq, Imtiaz Ahmad Taj, Noman M. Jafri,

... NVIDIA [23-25] introduced 'compute unified device architecture' (CUDA) in 2006. GPUs have been used efficiently in ... 2. Section 3 discusses the GPU and NVIDIA CUDA architecture. Section 4 discusses the proposed implementation of ... overview of the GPU architecture and introduces NVIDIA CUDA programming architecture. 3 GPU and NVIDIA CUDA architecture To transform or map CPU algorithm to ... power of GPU can be optimally utilised. NVIDIA CUDA is the hardware/software architecture where hardware architecture ...

Tópico(s): Forensic Fingerprint Detection Methods

2017 - Institution of Engineering and Technology | IET Image Processing

Artigo Acesso aberto Revisado por pares

Stefan K. Muller, Jan Hoffmann,

... high throughput in vector-parallel applications. NVIDIA's CUDA toolkit seeks to make GPGPU programming accessible by ... small extension of C/C++. However, due to CUDA's complex execution model, the performance characteristics of CUDA kernels are difficult to predict, especially for novice ... paper introduces a novel quantitative program logic for CUDA kernels, which allows programmers to reason about both functional correctness and resource usage of CUDA kernels, paying particular attention to a set of ...

Tópico(s): Embedded Systems Design Techniques

2021 - Association for Computing Machinery | Proceedings of the ACM on Programming Languages

Artigo Revisado por pares

Masashi Fukuzawa, Jeffrey G. Williams,

ABSTRACT The cudA gene encodes a nuclear protein that is essential for normal multicellular development. At the slug stage cudA is expressed in the prespore cells and in ... show that cap site distal promoter sequences direct cudA expression in prespore cells, while proximal sequences direct ... acting part of the prespore domain of the cudA promoter. However, Dd-STATa cannot be utilised for ... shows that Dd-STATa is not necessary for cudA transcription in prespore cells. In contrast, the part of the cudA promoter that directs prestalk-specific expression contains a ...

Tópico(s): Biocrusts and Microbial Ecology

2000 - The Company of Biologists | Development

Artigo Revisado por pares

Peitao Song, Zhijian Zhang, Qian Zhang, Liang Liang, Qiang Zhao,

... cluster. In this paper, a heterogeneous MPI + OpenMP/CUDA parallel algorithm for solving the 2D neutron transport ... exploited through OpenMP (in CPU calculated domain) and CUDA (in GPU calculated domain) based on the ray ... Moreover, the strong scaling performance of the MPI + CUDA parallelization is studied through a performance analysis model ... GPUs, and the MPI communication in the MPI + CUDA parallel algorithm. And the corresponding conclusion is still tenable for the MPI + OpenMP/CUDA parallelization. The C5G7 2D benchmark and an extended ...

Tópico(s): Advanced Neural Network Applications

2019 - Elsevier BV | Annals of Nuclear Energy

Artigo Acesso aberto Revisado por pares

Yoko Yamada, Hong Yu Wang, Masashi Fukuzawa, Geoffrey J. Barton, Jeffrey G. Williams,

CudA, a nuclear protein required for Dictyostelium prespore-specific gene expression, binds in vivo to the promoter ... 14 nucleotide region of the cotC promoter binds CudA in vitro and ECudA, an Entamoeba CudA homologue, also binds to this site. The CudA and ECudA DNA-binding sites contain a dyad and, consistent with a symmetrical binding site, CudA forms a homodimer in the yeast two-hybrid system. Mutation of CudA binding sites within the cotC promoter reduces expression from cotC in prespore cells. The CudA and ECudA proteins share a 120 amino acid ...

Tópico(s): interferon and immune responses

2008 - The Company of Biologists | Development

Artigo Acesso aberto Revisado por pares

Matthew J. Thurley, V. Danell,

... for faster morphological image processing, and the NVIDIA CUDA architecture offers a relatively inexpensive and powerful framework ... generic morphological erosion and dilation operation in the CUDA NPP library is relatively naive, and performance scales ... morphological image processing community. Open-source extensions to CUDA (hereafter referred to as LTU-CUDA) have been produced for erosion and dilation using ... by forgoing the use of shared memory in CUDA multiprocessors. The vHGW algorithm for erosion and dilation ...

Tópico(s): Advanced Neural Network Applications

2012 - Institute of Electrical and Electronics Engineers | IEEE Journal of Selected Topics in Signal Processing

Artigo

Yi Yang, Huiyang Zhou,

... parallel program, such as a GPU kernel in CUDA programs, still contains both se-quential code and ... our proposed solution to exploit nested parallelism in CUDA, referred to as CUDA-NP. With CUDA-NP, we initially enable a high number of ... for different code sections. We implemented our proposed CUDA-NP framework using a directive-based compiler approach. ... like pragmas for parallelizable code sections. Then, our CUDA-NP compiler automatically gen-erates the optimized GPU ... optimized and contain nested parallelism, our pro-posed CUDA-NP framework further improves the perfor-mance by ...

Tópico(s): Interconnection Networks and Systems

2014 - Association for Computing Machinery | ACM SIGPLAN Notices

Artigo Revisado por pares

Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen‐mei Hwu,

... this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow ... the coarse- and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high- ... that transforms the SIMT (Single Instruction, Multiple Thread) CUDA code into task-level parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive ... best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and ...

Tópico(s): Interconnection Networks and Systems

2013 - Association for Computing Machinery | ACM Transactions on Embedded Computing Systems

Artigo Acesso aberto Revisado por pares

Yi Yang, Chao Li, Huiyang Zhou,

... parallel program, such as a GPU kernel in CUDA programs, still contains both sequential code and parallel ... our proposed solution to exploit nested parallelism in CUDA, referred to as CUDA-NP. With CUDA-NP, we initially enable a high number of ... for different code sections. We implement our proposed CUDA-NP framework using a directive-based compiler approach. ... like pragmas for parallelizable code sections. Then, our CUDA-NP compiler automatically generates the optimized GPU kernels. ... been optimized and contain nested parallelism, our proposed CUDA-NP framework further improves the performance by up ...

Tópico(s): Interconnection Networks and Systems

2015 - Springer Science+Business Media | Journal of Computer Science and Technology

Artigo Acesso aberto Revisado por pares

Michał Januszewski, Marcin Kostur,

... with popular NVIDIA Graphics Processing Units using the CUDA programming environment. We address general aspects of numerical ... etc.: 5905 Distribution format: tar.gz Programming language: CUDA C Computer: any system with a CUDA-compatible GPU Operating system: Linux RAM: 64 MB ... 3 External routines: The program requires the NVIDIA CUDA Toolkit Version 2.0 or newer and the ... and perform the calculations on GPUs using the CUDA programming environment. The GPU's ability to execute ... question is performed on a GPU using the CUDA environment. Running time: < 1 minute

Tópico(s): stochastic dynamics and bifurcation

2009 - Elsevier BV | Computer Physics Communications

Artigo Acesso aberto Revisado por pares

Étiennette Combe, T. Achi, R. Pion, MC Valluy, ML Houlier, M. SALLAS, A. SELLE,

... satis- faire les besoins de la croissance.Le CUDa de l'azote est respectivement de 72 -75 - ... cas des lots fève -lentillepois chiche, mais le CUDa de certains acides aminés indispensables est nettement plus ... 71 - 75 pour la valine alors que le CUDa de l'arginine est toujours plus élevé 87 - ... to suit growth requirements.Nitrogen apparent digestibility coefficient (CUDa) was 72% in the faba bean, 75% in ... the chick P ea groups respectively, but the CUDa of some essential amino acids were much lower : ... cystine, 73 -71 -75% for valine, while arginine CUDa values (87 -87 -82) were higher than all ...

Tópico(s): Proteins in Food Systems

1991 - Elsevier BV | annales de biologie animale biochimie biophysique

Artigo Revisado por pares

Jie Cheng,

... by using an extension to C language, in CUDA which is a parallel programming environment supported on ... Hwu is principle investigator for the first NVIDIA CUDA Center of Excellence at the University of Illinois ... It also covers data parallelism, the basics of CUDA memory/threading models, the CUDA extensions to the C language, and the basic ... 7) enhances student programming skills by explaining the CUDA memory model and its types, strategies for reducing global memory traffic, the CUDA threading model and granularity which include thread scheduling ...

Tópico(s): Cloud Computing and Resource Management

2010 - | Scalable Computing Practice and Experience

Artigo Revisado por pares

Wenqian Jiang, Menghao Zhang, Yichen Wang,

... from vegetations. Nevertheless, the Compute Unified Device Architecture (CUDA) gives developers access to the virtual instruction set ... memory of the parallel computational elements in the CUDA compatible Graphics Processing Unit (GPU), which encourages us to develop a CUDA-based simulator for the solution. This paper analyzes the radiative transfer method and the CUDA architecture, and then presents a CUDA parallel algorithm for calculating the EM scattering from a two-layer vegetation canopy. In the CUDA-based simulation, with a GTS250 GPU as, which ...

Tópico(s): Cryospheric studies and observations

2010 - Taylor & Francis | Journal of Electromagnetic Waves and Applications

Artigo Acesso aberto Revisado por pares

Haixiang Shi, Bertil Schmidt, Weiguo Liu, Wolfgang Müller‐Wittig,

... we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 ... datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from ... existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time.CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup ...

Tópico(s): DNA and Biological Computing

2011 - BioMed Central | BMC Research Notes

Artigo Acesso aberto

Jayshree Ghorpade-Aher,

... well.In this paper, we will show how CUDA can fully utilize the tremendous power of these GPUs.CUDA is NVIDIA's parallel computing architecture.It enables ... power of the GPU.This paper talks about CUDA and its architecture.It takes us through a comparison of CUDA C/C++ with other parallel programming languages like ... paper also lists out the common myths about CUDA and how the future seems to be promising for CUDA.

Tópico(s): Advanced Image and Video Retrieval Techniques

2012 - | Advanced Computing An International Journal

Artigo Acesso aberto Revisado por pares

Yukihiro Komura, Yutaka Okabe,

We present sample CUDA programs for the GPU computing of the Swendsen–Wang multi-cluster spin flip algorithm. We deal with the classical ... 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: ... multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on ...

Tópico(s): Random Matrices and Applications

2013 - Elsevier BV | Computer Physics Communications

Artigo

Panagiotis D. Michailidis, Konstantinos G. Margaritis,

... Processing Units (GPUs) using Compute Unied Device Architecture (CUDA) programming model. In this work we discuss a naive and two optimised CUDA algorithms for the two kernel estimation methods: univariate ... also present exploratory experimental results of the proposed CUDA algorithms according to the several values of parameters ... results show signicant performance gains of all proposed CUDA algorithms over serial CPU version and small performance speed-ups of the two optimised CUDA algorithms over naive GPU algorithms. Finally, based on ...

Tópico(s): Advanced Data Compression Techniques

2013 - | Applied Mathematical Sciences

Artigo Revisado por pares

Vincent Roberge, Mohammed Tarbouchi,

... optimization (PSO) on graphical processing units (GPU) using CUDA. By fully utilizing the processing power of graphic processors, our implementation (CUDA-PSO) provides a speedup of 167× compared to ... CPU, it may be unfair to compare our CUDA implementation to a sequential one. For this reason, ... MPI-PSO) and compared its performance against our CUDA-PSO. The execution time of our CUDA-PSO remains 15.8× faster than our MPI- ... statistical significance that the results obtained using our CUDA-PSO are of equal quality as the results ...

Tópico(s): Islanding Detection in Power Systems

2013 - Imperial College Press | International Journal of Computational Intelligence and Applications

Artigo Revisado por pares

Gordon E. Davis,

... multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow ... merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism that use either MPI ... large data sets, and a dual-level MPI-CUDA implementation with maximum overlapping of computation and communication ... also find that our tri-level MPI-OpenMP-CUDA parallel implementation does not offer a significant advantage ...

Tópico(s): Plant Virus Research Studies

1956 - Elsevier BV | Experimental Parasitology

Artigo Revisado por pares

Yongchao Liu, Bertil Schmidt, Weiguo Liu, Douglas L. Maskell,

... to employ emerging many-core architectures such as CUDA-enabled GPUs. In this paper, we present a ... of the MEME motif discovery algorithm using the CUDA programming model. To achieve high efficiency, we introduce ... ZOOPS) motif search model. The runtime speedups of CUDA–MEME on a single GPU are also comparable ... workstation cluster. In addition to the fast speed, CUDA–MEME has the capability of finding motif instances ...

Tópico(s): Fractal and DNA sequence analysis

2009 - Elsevier BV | Pattern Recognition Letters

Capítulo de livro Acesso aberto Revisado por pares

Yonghong Yan, Max Grossman, Vivek Sarkar,

... GPGPUs) to obtain order-of-magnitude performance improvements. CUDA has emerged as a popular programming model for ... and C#, it is natural to explore how CUDA-like capabilities can be made accessible to those ... can be used by Java programmers to invoke CUDA kernels. Using this interface, programmers can write Java codes that directly call CUDA kernels, and delegate the responsibility of generating the Java-CUDA bridge codes and host-device data transfer calls ...

Tópico(s): Advanced Data Storage Technologies

2009 - Springer Science+Business Media | Lecture notes in computer science

Artigo Revisado por pares

Tianyi David Han, Tarek S. Abdelrahman,

... GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging ... hiCUDA}, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious ... compiler that translates a hiCUDA} program to a CUDA program. Our compiler is able to support real- ... and use dynamically allocated arrays. Experiments using nine CUDA benchmarks show that the simplicity hiCUDA} provides comes ...

Tópico(s): Real-Time Systems Scheduling

2010 - Institute of Electrical and Electronics Engineers | IEEE Transactions on Parallel and Distributed Systems

Artigo Acesso aberto Revisado por pares

Wladimir J. van der Laan, Andrei C. Jalba, Jos B. T. M. Roerdink,

... regarded as massively parallel coprocessors through NVidia's CUDA compute paradigm. The three main hardware architectures for ... based) are shown to be unsuitable for a CUDA implementation. Our CUDA-specific design can be regarded as a hybrid ... to an optimized CPU implementation and earlier non-CUDA-based GPU DWT methods, both for 2D images ... performance analysis shows that the results of our CUDA-specific design are in close agreement with our ...

Tópico(s): Digital Filter Design and Implementation

2010 - Institute of Electrical and Electronics Engineers | IEEE Transactions on Parallel and Distributed Systems

Artigo Revisado por pares

Tomasz Dziubak, Jacek Matulewski,

... FFT algorithm. The solution is based on NVIDIA CUDA technology. The speed-up factor in the test ... format: tar.gz Programming language: C++, C for CUDA Computer: Graphics card with CUDA technology recommended Operating system: No limits (tested on ... of processors used – one CPU core and all CUDA cores of the selected processor of graphics card ... equation. Solution method: FFT and Chebyshev polynomial algorithm, CUDA technology. Running time: Every test example included in ...

Tópico(s): Spectroscopy and Quantum Chemical Studies

2011 - Elsevier BV | Computer Physics Communications

Artigo Revisado por pares

Daniel Kuchelmeister, Thomas Müller, Marco Ament, Günter Wunner, Daniel Weiskopf,

... GPU using NVidia’s Compute Unified Device Architecture (CUDA), which leads to performance improvement of an order ... 1334251 Distribution format: tar.gz Programming language: C++, CUDA. Computer: Linux platforms with a NVidia CUDA enabled GPU (Compute Capability 1.3 or higher), C++ compiler, NVCC (The CUDA Compiler Driver). Operating system: Linux. RAM: 2 GB ... External routines: OpenGL Utility Toolkit development files, NVidia CUDA Toolkit 3.2, Lua5.2 Nature of problem: ... of light rays, GPU-based parallel programming using CUDA, 3D-Rendering via OpenGL. Running time: Problem dependent, ...

Tópico(s): Pulsars and Gravitational Waves Research

2012 - Elsevier BV | Computer Physics Communications