Scientific Publications

[J53] Evaluation of Static Mapping for Dynamic Space-Shared Multi-Task Processing on FPGAs
U. Minhas, R. Woods and G. Karakonstantis
Springer Journal of Signal Processing Systems
Invited Paper, accepted Jan 2021
DOI:

[J52] Towards Lower Precision Adaptive Filters: Facts from Backward Error Analysis of RLS
JunKyu Lee and Hans Vandierendonck
IEEE Transactions on Signal Processing
Under Review from Major Revision
DOI:

[J51] Mixed-precision kernel recursive least squares,
JunKyu Lee, Dimitrios S. Nikolopoulos and Hans Vandierendonck
IEEE Transactions on Neural Networks and Learning Systems
December 2020
DOI: 10.1109/TNNLS.2020.3041677

[J50] Reducing the Burden of Parallel Loop Schedulers for Many-Core Processors
Mahwish Arif and Hans Vandierendonck
Concurrency and Computation: Practice and Experience
Accepted for Publication
DOI:

[J49] Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing
Florian Zaruba, Fabien Schuiki, Luca Benini
IEE Micro
Under Review
DOI:

[J48] Designing High Performance Algorithms for Architectures Over-Provisioned for Arithmetic Performance
H. Anzt, E. S. Quintana
Parallel Computing
Under Review
DOI:

[J47] Compressed Basis GMRES on High Performance GPUs
J. Aliaga, H. Anzt, T.Grutzmacher, E. S. Quintana, A. E. Tomás
ACM Trans. on Mathematical Software
Under Review
DOI:

[J46] Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing
H. Anzt, T. Cojean, G. Flegar, F. Goebel, T. Gruetzmacher, T. Ribizel, P. Nayak, Y. Tsai, E. S. Quintana
ACM Trans. on Mathematical Software
Under Review
DOI:

[J45] FPnew: An Open-Source Multi-Format Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing
Stefan Mach, Fabian Schuiki, Florian Zaruba, Luca Benini
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Under Review
DOI: https://arxiv.org/abs/2007.01530

[J44] DTA-PUF: Dynamic Timing Aware Physical Unclonable Function for Resource Constrained Devices
I. Tsiokanos, J. Miskely, C. Gu, M. O’Neill and G. Karakonstantis
Accepted in ACM Journal on Emerging Technologies in Computing Systems (JETC)
2020
DOI:

[J43] AIR: Iterative refinement using arbitrary dynamic precision
JunKyu Lee, Gregory D. Peterson, Dimitrios S. Nikolopoulos, Hans Vandierendonck
Parallel Computing
September 2020
DOI: 10.1016/j.parco.2020.102663

[J42] Fundamental Limits in Dissipative Processes during Computation
Davide Chiucchiú, Maria Cristina Diamantini, Miquel López-Suárez, Igor Neri, Luca Gammaitoni
Entropy
19 August, 2020
DOI: 10.3390/e21090822

[J41] Always-On 674μ W@4GOP/s Error Resilient Binary Neural Networks With Aggressive SRAM Voltage Scaling on a 22-nm IoT End-Node
Alfio Di Mauro, Francesco Conti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini
IEEE Transactions on Circuits and Systems I: Regular Papers
4 August 2020
DOI:10.1109/TCSI.2020.3012576

[J40] Efficient Hardware Architectures for 1D- and MD-LSTM Networks
V. Rybalkin, C. Sudarshan, C. Weis, J. Lappas, N. Wehn, L. Cheng
Springer “Journal of Signal Processing Systems”
August, 2020
DOI: 10.1007/s11265-020-01554-x

[J39] ExHero: Execution History-aware Error-rate Estimation in Pipelined Designs,
I. Tsiokanos and G. Karakonstantis
IEEE Micro
27 July 2020
DOI:10.1109/MM.2020.3012045

[J38] Thermodynamic reversible transformations in micro-electro- mechanical systems
Igor Neri, Miquel Lopez-Suarez
Eur. Phys. J. B
18 June 2020
DOI:10.1140/ep jb/e2018-80632-9

[J37] Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores
Fabian Schuiki, Florian Zaruba, Torsten Hoefler, Luca Benini
IEEE Transactions on Computers
15 April 2020
DOI: 10.1109/TC.2020.2987314

[J36] Acceleration of PageRank with customized precision based on mantissa segmentation
T. Grützmachar, T. Cojean, G. Flegar, H. Anzt, E. S. Quintana
ACM Trans. on Parallel Computing
March 2020
DOI: 10.1145/3380934

[J35] Tall-and-Skinny QR Factorization with Approximate Householder Reflectors on Graphics Processors
A. E. Tomás; Enrique S. Quintana-Orti
Journal of Supercomputing
17 January 2020
DOI:10.1007/s11227-020-03176-3

[J34] Performance-aware predictive-model-based on-chip body-bias regulation strategy for an ULP multi-core cluster in 28 nm UTBB FD-SOI
Alfio Di Mauro, Davide Rossi, AntonioPullini, Philippe Flatresse, Luca Benini
Integration, 72, pp.194-207, Elsevier, 2020
14 January 2020
DOI: 10.1016/j.vlsi.2019.12.006

[J33] PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors
Garofalo, Angelo, Manuele Rusci, Francesco Conti, Davide Rossi, and Luca Benini
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
23 Dec 2019
DOI: 10.1098/rsta.2019.0155

[J32] EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
Lukas Cavigelli, Georg Rutishauser, Luca Benini
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
29 Oct 2019
DOI:10.1109/JETCAS.2019.2950093

[J31] FloatX: A C++ library for customized floating-point arithmetic
Goran Flegar, Florian Scheidegger, Vedran Novakovic, Giovani Mariani, A. E. Tomás, A. Cristiano M. Malossi, Enrique S. Quintana-Orti
ACM Trans. Mathematica Software
19 Dec 2019
DOI:10.1109/VLSI-SoC.2019.8920307

[J30] A machine learning approach to online fault classification in HPC systems
Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sîrbu, Andrea Bartolini, Andrea Borghesi
Future Generation Computer Systems, Volume 110, Pages 1009-1022, Elsevier, 2020
27 Nov. 2019
DOI: 10.1016/j.future.2019.11.029

[J29] A 0.80pJ/flop, 1.24Tflop/sW 8-to-64 bit Transprecision Floating-Point Unit for a 64 bit RISC-V Processor in 22nm FD-SOI
Stefan Mach, Fabian Schuiki, Florian Zaruba, Luca Benini
2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC)
6-9 October 2019
DOI:10.1109/VLSI-SoC.2019.8920307

[J28] ] A 64-mW DNN-Based Visual Navigation Engine for Autonomous Nano-Drones
Daniele Palossi, Antonio Loquercio, Francesco Conti; Eric Flamand, Davide Scaramuzza, Luca Benini
IEEE Internet of Things Journal
October 2019
DOI: 10.1109/TVLSI.2019.2926114

[J27] The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Techno
Florian Zaruba, Luca Benini
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
26 July 2019
DOI:10.1109/TVLSI.2019.2926114

[J26] Efficient Hardware Architectures for 1D-and MD-LSTM Networks
V. Rybalkin, C. Sudarshan, C. Weis, J. Lappas, N. Wehn, L. Chen
Springer Journal of Signal Processing Systems
2 July 2020
DOI: I:10.1007/S11265-020-01554-X

[J25] Towards a modular precision ecosystem for high performance comp
H. Anzt, G. Flegar, T. Grützmacher, E. S. Quintana-Ortí
The International Journal of High Performance Computing Applications
May 9, 2019
DOI: 10.1177/1094342019846547

[J24] Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
R. Andri, L. Cavigelli, D. Rossi and L. Benini
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
June 2019
DOI: 10.1109/JETCAS.2019.2905654

[J23] Online Learning and Classification of EMG-Based Gestures on a Parallel Ultra-Low Power Platform Using Hyperdimensional Computing
S. Benatti, F. Montagna, V. Kartsch, A. Rahimi, D. Rossi and L. Benini
IEEE Transactions on Biomedical Circuits and Systems
June 2019
DOI: 10.1109/TBCAS.2019.2914476

[J22] Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing
A. Pullini, D. Rossi, I. Loi, G. Tagliavini and L. Benini
IEEE Journal of Solid-State Circuits
15 May, 2019
DOI: 10.1109/JSSC.2019.2912307

[J21] Towards a modular precision ecosystem for high performance computing
H. Anzt, G. Flegar, T. Grützmacher, E. S. Quintana-Ortí
Int. J. of High Performance Computing Applications
May 9, 2019
DOI: 10.1177/1094342019846547

[J20] Significance-Driven Data Truncation for Preventing Timing Failures
Ioannis Tsiokanos, Lev Mukhanov, Dimitrios S. Nikolopoulos, Georgios Karakonstantis
IEEE Transactions on Device and Materials Reliability
12 February, 2019
DOI: 10.1109/TDMR.2019.2898949

[J19] Dynamic look-ahead in the reduction to band form for the Singular Value Decomposition
A. E. Tomás, R. Rodríguez-Sánchez, S. Catalán, R. Carratalá, E. S. Quintana-Ortí
Parallel Computing, Volume 81
January 2019
DOI: 10.1109/TDMR.2019.2898949

[J18] NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs
Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, and Luca Benini
ACM Trans. Reconfigurable Technol. Syst.
December, 2018
DOI: 10.1145/3284357

[J17] FlexFloat: A Software Library for Transprecision Computing
Giuseppe Tagliavini, Andrea Marongiu, Luca Benini
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
December, 2018
DOI: 10.1109/TCAD.2018.2883902

[J16] A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets
Fabian Schuiki, Michael Schaffner, Frank K. Gurkaynak, Luca Benini
IEEE Transactions on Computers ( Volume: 68 , Issue: 4 , April 1 2019 )USA
DOI: 10.1109/TC.2018.2876312
22 October 2018

[J15] A sensor fusion approach for drowsiness detection in wearable ultra-low-power systems
Victor Javier Kartsch, Simone Benatti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini
Science Direct, Information Fusion, Volume 43, Pages 66-76
September, 2018
DOI: 10.1016/j.inffus.2017.11.005

[J14] Energy-Efficient Iterative Refinement using Dynamic Precision
JunKyu Lee, Hans Vandierendonck, Mahwish Arif, Gregory D. Peterson, Dimitrios S. Nikolopoulos
IEEE Journal of Emerging and Selected Topics in Circuits and Systems
25 June, 2018
DOI: 10.1109/JETCAS.2018.2850665

[J13] An Energy-Efficient Integrated Programmable Array Accelerator and Compilation flow for Near-Sensor Ultra-low Power Processing
S. Das, K. J. M. Martin, D. Rossi, P. Coussy and L. Benini
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE, 2018, pp.1 – 1.
8 May, 2018
DOI: 10.1109/TCAD.2018.2834397

[J12] The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores
I. Loi, A. Capotondi, D. Rossi, A. Marongiu and L. Benini
in IEEE Transactions on Parallel and Distributed Systems, (Volume: 29 , Issue: 2)
April-June, 2018
DOI: 10.1109/TPDS.2017.2752706

[J11] Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes
E. Azarkhish, D. Rossi, I. Loi and L. Benini
in IEEE Transactions on Parallel and Distributed Systems, (Volume: 29 , Issue: 2)
1 Feb, 2018
DOI: 10.1109/TPDS.2017.2752706

[J10] A Machine Learning Approach for Automated Wide-Range Frequency Tagging Analysis in Embedded Neuromonitoring Systems
Fabio Montagna, Marco Buiatti, Simone Benatti, Davide Rossi, Elisabetta Farella, Luca Benini
Science Direct (Parallel Computing)
22 June, 2017
DOI:10.1016/j.ymeth.2017.06.019

[J9] 2.2-μW Cognitive Always-On Wake-Up Circuit for Event-Driven Duty-Cycling of IoT Sensor Nodes
Giovanni Rovere, Schekeb Fateh, Luca Benini
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( Volume: 8 , Issue: 3 , Sept. 2018 ))
DOI: 10.1109/JETCAS.2018.2828505
19 April, 2018

[J8] Adaptive precision in block-Jacobi preconditionning for iterative sparse linear system solvers
Hartwig Anzt, Jack Dongarra, Goran Flegar, Nicholas Higham, Enrique S. Quintana-Ortí
Concurrency and Computation (Practice and Experience)
12 March 2018
DOI: 10.1002/cpe.4460

[J7] Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD
Rafael Rodríguez-Sánchez, Sandra Catalán, Jose R. Herrero, Enrique S. Quintana-Ortí, Andrés Tomás
Numerical Algorithms
March 2018
DOI: 10.1007/s11075-018-0500-8

[J6] Cost of remembering a bit of information
D. Chiuchiù, M. López-Suárez, I. Neri, M.C. Diamantini, L. Gammaitoni
Physical Review A
8 May, 2018
DOI: 10.1103/PhysRevA.97.052108

[J5] Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditionning on Graphics Processors
Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-OrtiM
Parallel Computing
DOI: 10.1016/j.parco.2017.12.006
22 January, 2018

[J4] Flexible, Scalable and Energy Efficient Bio-Signals on the PULP Platform: A Case Study on Seizure Detection
Fabio Montagna, Simone Benatti, Davide Rossi
Journal of Low Power Electronics and Applications
2017, 7(2), 16
DOI: 10.3390/jlpea7020016

[J3] An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
Francesco Conti, Robert Schilling, Pasquale D. Schiavone, Antonio Pullini, Davide Rossi, Frank K. G¨urkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini
IEEE Transactions on Circuits and Systems I: Regular Papers ( Volume: PP, Issue: 99 )
13 May 2017
DOI: 10.1109/TCSI.2017.2698019

[J2] A Hybrid Instruction Prefetching Mechanism for Ultra Low-Power Multicore Clusters
Maryam Payami, Erfan Azarkhish, Igor Loi, Luca Benini
24 May 2017
IEEE Embedded Systems Letters ( Volume: 9 , Issue: 4 , Dec. 2017), USA
DOI: 10.1109/LES.2017.2707978

[J1] A Prosthetic Hand Body Area Controller Based on Efficient Pattern Recognition Control Strategies.
S. Benatti, B. Milosevic, E. Farella, E. Gruppioni, L. Benini
Sensors 2017, Basel, Switzerland
15 April 2017
DOI: 10.3390/s17040869

Conference Proceedings

[C80] TOD: Transprecise-based Object Detection to Maximise Real-Time Accuracy on the Edge
JunKyu Lee, Blesson Varghese, Roger Woods and Hans Vandierendonck
IEEE International Conference on Fog and Edge Computing 2021
Submitted
DOI:

[C79] DStress: Automatic Synthesis of DRAM Reliability Using Genetic Algorithms
Lev Mukhanov, Dimitrios S. Nikolopoulos, Georgios Karakonstantis
IEEE/ACM International Symposium on Microarchitecture (MICRO), Global Online Event
Nominee for the MICRO2020 Best Paper Award
DOI: 10.1109/MICRO50266.2020.00035
17-21 Oct. 2020

[C78] Efficient Generation of Application Specific Memory Controllers
M. V. Natale, M. Jung, K. Kraft, F. Lauer, J. Feldmann, C. Sudarshan, C. Weis, S. O. Krumke, N. Wehn
ACM/IEEE International Symposium on Memory Systems (MEMSYS 2020), Washington, DC, USA.
DOI:
10-21 Oct. 2020

[C77] Multi-Valued Physical Unclonable Functions based on Dynamic Random Access Memory
S. Müelich, C. Sudarshan, C. Weis, M. Bossert, R. F. H. Fischer, N. Wehn
ACM/IEEE International Symposium on Memory Systems (MEMSYS 2020), Washington, DC, USA.
DOI:
10-21 Oct. 2020

[C76] An Energy Efficient 3D-Heterogeneous Main Memory Architecture for Mobile Devices
D. M. Mathew, F. S. Prado, É. F. Zulian, C. Weis, M. M. Ghaffar, M. Jung, N. Wehn
ACM/IEEE International Symposium on Memory Systems (MEMSYS 2020), Washington, DC, USA.
DOI:
10-21 Oct. 2020

[C75] An In-DRAM Architecture for Quantized CNNs using Fast Winograd Convolutions
M. M. Ghaffar, C. Sudarshan, C. Weis, M. Jung, N. Wehn
ACM/IEEE International Symposium on Memory Systems (MEMSYS 2020), Washington, DC, USA.
DOI:
10-21 Oct. 2020

[C74] Access-Aware Per-Bank DRAM Refresh for Reduced DRAM Refresh Overhead
É. F. Zulian, C. Weis, N. Wehn
IEEE International Symposium on Circuits & Systems (ISCAS), Seville, Spain.
DOI: 10.1109/ISCAS45731.2020.9180873
10-21 Oct. 2020

[C73] ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor
Andreas Kurth; Samuel Riedel; Florian Zaruba; Torsten Hoefler; Luca Benini
2020 57th ACM/IEEE Design Automation Conference (DAC), Virtual Conference
DOI: 10.1109/DAC18072.2020.9218661
9 Oct. 2020

[C72] Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack
Dionysios Diamantopoulos, Burkhard Ringlein, Mitra Purandare, Gagandeep Singh, Christoph Hagleitner
2020 30th International Conference on Field-Programmable Logic and Applications (FPL)
DOI: 10.1109/FPL50879.2020.00058
Sept. 2020

[C71] NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling
Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan Gómez-Luna, Sander Stuijk, Onur Mutlu, Henk Corporaal
2020 30th International Conference on Field-Programmable Logic and Applications (FPL)
DOI: 10.1109/FPL50879.2020.00014
Sept. 2020

[C70] Half-Precision Floating-Point Formats for PageRank: Opportunities and Challenges
Ami S. Molahosseini, Hans Vandierendonck
HPEC’ 20: IEEE High-Performance Extreme Computing Virtual Conference
DOI: 10.1109/HPEC43674.2020.9286179
21_25 Sept. 2020

[C69] Temporal Variability Analysis in sEMG Hand Grasp Recognition using Temporal Convolutional Networks
Marcello Zanghieri; Simone Benatti; Francesco Conti; Alessio Burrello; Luca Benini
2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genoa, Italy
DOI: 10.1109/AICAS48895.2020.9073888
31 Aug. – 4 Sept. 2020

[C68] Balanced and compressed coordinate layout for the sparse matrix-vector product on GPUs
J. I. Aliaga, H. Anzt, E. S. Quintana, A. E. Tomás, Y. M. Tsai
Accepted at HeteroPar 2020, Varsaw, Poland
DOI:
24-28 August 2020

[C67] Multiprecision block-Jacobi for iterative triangular solves
F. Goebel, H. Anzt, T. Cojean, G. Flegar, E. S. Quintana
Euro-Par 2020, Varsaw, Poland
DOI: 10.1007/978-3-030-57675-2_34
August 24-28, 2020

[C66] Injective Domain Knowledge in Neural Networks for Transprecision Computing
Andrea Borghesi, Federico Baldo, Michele Lombardi, Michela Milano
6th Annual Conference on machine Learning, Optimization and Data science (LOD), Siena, Italy
DOI: 10.1007/978-3-030-64583-0_52
July 19-23, 2020

[C65] A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference
Gianmarco Ottavi; Angelo Garofalo; Giuseppe Tagliavini; Francesco Conti; Luca Benini; Davide Rossi
2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, CYPRUS
DOI: 10.1109/ISVLSI49217.2020.000-5
6-8 July 2020

[C64] Graptor: efficient pull and push style vectorized graph processing
Hans Vandierendonck
ICS ’20: Proceedings of the 34th ACM International Conference on Supercomputing, Worldwide Online Event
DOI: 10.1145/3392717.3392753
20 June 2020

[C63] PHRYCTORIA: A Messaging System for Transprecision OpenCAPI-attached FPGA Accelerators.
Dionysios Diamantopoulos; Mitra Purandare; Burkhard Ringlein; Christoph Hagleitner
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
DOI: 10.1109/IPDPSW50202.2020.00023
May 2020

[C62] Combining Learning and Optimization for Transprecision Computing
Andrea Borghesi, Giuseppe Tagliavini, Michele Lombardi, Luca Benini, Michela Milano
17th ACM International Conference on Computing Frontiers (CF’20), Catania, Sicily, Italy
DOI: 10.1145/3387902.3392615
1-10 June 2020

[C61] XwattPilot: A Full-stack Cloud System Enabling Agile Development of Transprecision Software for Low-power SoCs
Dionysios Diamantopoulos, Florian Scheidegger, Stefan Mach, Fabian Schuiki, Germain Haugou, Michael Schaffner, Frank K. Gürkaynak, Christoph Hagleitner, A. Cristiano I. Malossi, Luca Benini
2020 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)
DOI: 10.1109/COOLCHIPS49199.2020.9097644
April 2020

[C60] DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search
I. Tsiokanos, L. Mukhanov, G. Georgakoudis, D. S. Nikolopoulos and G. Karakonstantis
2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France
DATE 2020 Best Paper Award
DOI: 10.23919/DATE48585.2020.9116363
April-June 2020

[C59] HaRMony: Heterogeneous-Reliability Memory and QoS-Aware Energy Management on Virtualized Servers
Konstantinos Tovletoglou, Lev Mukhanov, Dimitrios S. Nikolopoulos, Georgios Karakonstantis
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Lausanne, Switzerland
DOI: 0.1145/3373376.3378489
16-20 March 2020

[C58] Mixed-data-model heterogeneous compilation and OpenMP offloading
Andreas Kurth, Koen Wolters, Björn Forsberg, Alessandro Capotondi, Andrea Marongiu, Tobias Grosser, and Luca Benini
CC 2020: 29th International Conference on Compiler Construction , San Diego, CA, USA
DOI: 10.1145/3377555.3377891
22-23 Feb. 2020

[C57] Precision variable anonymization method supporting transprecision computing
Keiya Harada, Henri-Pierre Charles, Hiroaki Nishi
22nd International Conference on Advanced Communication Technology (ICACT)
DOI: 10.23919/ICACT48636.2020.9061512
16-19 Feb. 2020

[C56] System Simulation with PULP Virtual Platform and SystemC
É. F. Zulian, G. Haugou, C. Weis, M. Jung, N. Wehn
International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO), Bologna, Italy.
DOI: 10.1145/3375246.3375256
January, 2020

[C55] Network-Accelerated Non-Contiguous Memory Transfers
Di Girolamo, Salvatore, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, and Torsten Hoefler
International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19), Denver, CO, USA
DOI: 10.1145/3295500.3356189
17-22 Nov. 2019

[C54] Dory: Lightweight memory hierarchy management for deep NN inference on iot endnodes
A Burrello, F Conti, A Garofalo, D Rossi, L Benini
International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS 2019), New York, USA
DOI: 10.1145/3349567.3351726
October 2019

[C53] Fast Validation of DRAM Protocols with Timed Petri Nets
M. Jung, K. Kraft, T. Soliman, C. Sudarshan, C. Weis, N. Wehn
MEMSYS 2019 ,Washington, DC, USA
DOI: 10.1145/3357526.3357556
Sept – 2019

[C52] Constrained deep neural network architecture search for IoT devices accounting hardware calibration
Florian Scheidegger, Luca Benini, Costas Bekas, Cristiano Malossi
Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
DOI: arXiv:1909.10818
Sept 2019

[C51] NARMADA: Near-memory horizontal diffusion accelerator for scalable stencil computations,
Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner
FPL2019 Barcelona, Spain
DOI: 10.1109/FPL.2019.00050
8-12 Sept. 2019

[C50] Cholesky and Gram-Schmidt orthogonalization for tall-and-skinny QR factorization on graphics processors.
A. Tomás, E. S. Quintana-Ortí.
Euro-Par 2019 Barcelona, Spain
DOI: 10.1007/978-3-030-29400-7_33
August 26-30 2019

[C49] HelmGemm: Managing GPUs and FPGAs for transprecision GEMM workloads in containerized environments
Dionysios Diamantopoulos, Christoph Hagleitner
ASAP2019 Cornell Tech, New York
DOI: 10.1109/ASAP.2019.00-27
July 15-17 2019

[C48] RRAMSpec: A Design Space Exploration Framework for High Density Resistive RAM
D. M. Mathew, A. Chinazzo, C. Weis, M. Jung, B. Giraud, P. Vivet, A. Levisse, N. Wehn
SAMOS2019 Samos Island, Greece
DOI: 10.1007/978-3-030-27562-4_3
July 7-11 2019

[C47] A Lean, Low Power, Low Latency DRAM Memory Controller for Transprecision Computing
C. Sudarshan, J. Lappas, C. Weis, D. M. Mathew, M. Jung, N. Wehn
SAMOS2019 Samos Island, Greece
DOI: 10.1007/978-3-030-27562-4_31
July 7-11 2019

[C46] Low Precision Processing for High Order Stencil Computations
Gagandeep Singh, Dionysios Diamantopoulos, Sander Stuijk, Henk Corporaal, Christoph Hagleitner
SAMOS2019 Samos Island, Greece
DOI: 10.1007/978-3-030-27562-4_29
July 7-11 2019

[C45] Prediction of Time-to-Solution in Material Science Simulations Using Deep Learning
Pittino, Federico, Pietro Bonfà, Andrea Bartolini, Fabio Affinito, Luca Benini, and Carlo Cavazzoni
PASC ’19 Proceedings of the Platform for Advanced Scientific Computing Conference, Zurich, Switzerland
DOI: 10.1145/3324989.3325720
June 2019

[C44] An open source and open hardware deep learning-powered visual navigation engine for autonomous nano-UAVs
Daniele Palossi; Francesco Conti; Luca Benini
15th Annual International Conference on Distributed Computing in Sensor Systems (DCOSS 2019), Santorini Island, Greece
DOI: 10.1109/DCOSS.2019.00111
29 – 31 May, 2019

[C43] An Energy-Efficient IoT node for HMI applications based on an ultra-low power Multicore Processor
Victor Kartsch, Marco Guermandi, Simone Benatti, Fabio Montagna, Luca Benini
2019 IEEE Sensors Applications Symposium (SAS) , Sophia Antipolis, France
DOI: 10.1109/SAS.2019.8705984
6 May, 2019

[C42] An In-DRAM Neural Network Processing Engine
C. Sudarshan, J. Lappas, M. M. Ghaffar, V. Rybalkin, C. Weis, M. Jung, N. Wehn
ISCAS , Sapporo, Japan.
DOI: 10.1109/ISCAS.2019.8702458
May, 2019

[C41] NTX: An Energy-efficient Streaming Accelerator for Floating-point Generalized Reduction Workloads in 22 nm FD-SOI
Fabian Schuiki, Michael Schaffner, Luca Benini
Date 2019 , Florence, Italy
DOI: 10.23919/DATE.2019.8715007
28-29 March 2019

[C40] Coherently Attached Programmable Near-Memory Acceleration Platform and its application to Stencil Processing
Jan van Lunteren, Ronald Luijten, Dionysios Diamantapoulos, Florian Auernhammer, Christoph Hagleitner, Lorenzo Chelini, Stefano Corda, Gagandeep Singh
Date 2019 , Florence, Italy
DOI: 10.23919/DATE.2019.8715088
28-29 March 2019

[C39] Low-Power Variation-Aware Cores based on Dynamic Data-Dependent Bitwidth Truncation
I. Tsiokanos, L. Mukhanov and G. Karakonstantis
Date 2019 , Florence, Italy
DOI: 10.23919/DATE.2019.8714942
28-29 March 2019

[C38] Design and Evaluation of SmallFloat SIMD extensions to the RISC-V ISA
G. Tagliavini, S. Mach, D. Rossi, A. Marongiu, D. Rossi
Date 2019 , Florence, Italy
DOI: 10.23919/DATE.2019.8714897
28-29 March 2019

[C37] Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
Lukas Cavigelli; Luca Benini
2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinschu, Taiwan
DOI: 10.1109/AICAS.2019.8771562
18-20 March 2019

[C36] Quentin: an Ultra-Low-Power PULPissimo SoC in 22nm FDX
Pasquale D.Schiavone, Davide Rossi, Antonio Pullini, Alfio Di Mauro, Francesco Conti
2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), Burlingame, CA, USA, USA
DOI: 10.1109/S3S.2018.8640145
14 Feb 2019

[C35] Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
Andreas Kurth, Pirmin Vogel, Andrea Marongiu, Luca Benini
2018 IEEE 36th International Conference on Computer Design (ICCD), Orlando, FL, USA
DOI: 10.1109/ICCD.2018.00052
17 Jan 2019

[C34] A System-level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping
Dionysios Diamantopoulos, Christoph Hagleitner
FTP’18, Naha, Okinawa, Japan
DOI: 10.1109/FPT.2018.00068
10-14 December 2018

[C33] High-performance GPU implementation of PageRank with reduced precision based on mantissa segmentation
T. Grützmacher, H. Anzt, F. Scheidegger, E. S. Quintana-Ortí.
2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3) ,Dallas, TX, USA
DOI: 10.1109/IA3.2018.00015
12 November 2018

[C32] Independent Body-Biasing of P-N Transistors in an 28nm UTBB FD-SOI ULP Near-Threshold Multi-Core Cluster
Alfio Di Mauro, Davide Rossi, Antonio Pullini, Philippe Flatresse, Luca Benini
2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), Burlingame, CA, USA, USA
DOI: 10.1109/S3S.2018.8640136
15-18 October 2018

[C31] Efficient Coding Scheme for DDR4 Memory Subsystems
K. Kraft, D. M. Mathew, C. Sudarshan, M. Jung, C. Weis, N. Wehn, F. Longnos
MEMSYS 2018, Washington DC, USA
Best Paper Award
DOI: 10.1145/3240302.3240424
1-4 October 2018

[C30] Driving Into the Memory Wall: The Role of Memory for Advanced Driver Assistance Systems and Autonomous Driving
M. Jung, S. A. McKee, C. Sudarshan, C. Dropmann, C. Weis, N. Wehn
(MEMSYS 2018) , Washington DC, USA
DOI: 10.1145/3240302.3240322
1-4 October 2018

[C29] Quantized NNs as the Definitive Solution for Inference on Low-Power ARM MCUs
M. Rusci, A. Capotondi, F. Conti, L. Benini
Accepted for publication at Embedded System Week 2018 (ESWEEK 2018) , Torino, Italy
DOI: 10.1109/CODESISSS.2018.8525915
30 Sept – 5 Oct 2018

[C28] The Cost of Remembering
Luca Gammaitoni, Igor Neri, Miquel López-Suárez, Davide Chiuchiù, Maria Cristina Diamantini
5th International Conference on Applications in Nonlinear Dynamics
DOI: 10.1007/978-3-030-10892-2_1
5-9 Aug. 2018

[C27] Variation-aware pipelined cores through path shaping and dynamic cycle adjustment: Case study on a floating-point unit
I. Tsiokanos, L. Mukhanov, D. S. Nikolopoulos, and G. Karakonstantis
ISLPED 2018 , Seatle, WA, USA
DOI: 10.1145/3218603.3218617
23-25 July 2018

[C26] Minimization of timing failures in pipelined designs via path shaping and operand truncation
I. Tsiokanos, L. Mukhanov, D. S. Nikolopoulos, and G. Karakonstantis
IOLTS 2018 , latja d’Aro, Costa Brava, Spain
DOI: 10.1109/IOLTS.2018.8474084
2-4 July 2018

[C25] Residual replacement in mixed-precision iterative refinement for sparse linear systems
H. Anzt, G. Flegar, V. Novakovic, E. S. Quintana-Ortí, A. Tomás.
ISC Workshops 2018 , Frankfurt/Main, Germany
28 June 2018
DOI: 10.1007/978-3-030-02465-9_39

[C24] The Role of Memories in Transprecision Computing
C. Weis, M. Jung, É. F. Zulian, C. Sudarshan, D. M. Mathew, N. Wehn
IEEE International Symposium on Circuits & Systems (ISCAS 2018), Florence, Italy
DOI: 10.1109/ISCAS.2018.8351768
27-30 May 2018

[C23] A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing
Stefan Mach, Davide Rossi, Giuseppe Tagliavini, Andrea Marongiu, Luca Benini
IEEE International Symposium on Circuits & Systems (ISCAS 2018), Florence, Italy
DOI: 10.1109/ISCAS.2018.8351816
27-30 May 2018

[C22] A Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics
S. Das, K. J. M. Martin, P. Coussy and D. Rossi
IEEE International Symposium on Circuits & Systems (ISCAS 2018), Florence, Italy
DOI: 10.1109/ISCAS.2018.8351749
27-30 May 2018

[C21] Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity?
Manuele Rusci, Lukas Cavigelli, Luca Benini
ISCAS 2018, Florence, Italy
DOI: 10.1109/ISCAS.2018.8351807
4 May, 2018

[C20] ecTALK: Energy efficient coherent transprecision accelerators — The bidirectional long short-term memory neural network case
Dionysios Diamantopoulos
2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS))
DOI: 10.1109/CoolChips.2018.8373077
18-20 April, 2018

[C19] Fast Blocking of Houdeholder Reflectors on Graphics Processors
Andrés Tomás, Enrique S. Quintana-Ortí
26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2018)
DOI: 10.1109/PDP2018.2018.00068
21-23 March 2018

[C18] Extending the POWER Architecture with Transprecision Co-Processors
Heiner Giefers, Dionysios Diamantopoulos
International Symposium on Circuits and Systems (ISCAS 2018), Florence, Italy
DOI: 10.1109/ISCAS.2018.8351755
March 2018

[C17] The Transprecision Computing Paradigm: Concept, Design, and Applications
A. Cristiano I. Malossi, Michael Schaffner, Anca Molnos, Luca Gammaitoni, Giuseppe Tagliavini, Andrew Emerson, Andrés Tomás, Dimitrios S. Nikolopoulos, Eric Flamand, Norbert Wehn
Accepted for publication, IEEE Conference Design, Automation and Test in Europe (DATE 2018), Dresden, Germany.
DOI: 10.23919/DATE.2018.8342176
March 2018

[C16] A Transprecision Floating-Point Platform for Ultra-Low Power Computing
Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, Luca Benini
IEEE Conference Design, Automation and Test in Europe (DATE), Dresden, Germany.
DOI: 10.23919/DATE.2018.8342167
March, 2018.

[C15] An Analysis on Retention Error Behavior and Power Consumption of Recent DDR4 DRAMs
D. M. Mathew, M. Schultheis, C. Rheinländer, C. Sudarshan, M. Jung, C. Weis, N. Wehn.
Accepted for publication, IEEE Conference Design, Automation and Test in Europe (DATE), Dresden, Germany.
DOI: 10.23919/DATE.2018.8342023
March, 2018.

[C14] Improving the Error Behavior of DRAM by Exploiting its Z-Channel Property
K. Kraft, M. Jung, C. Sudarshan, D. M. Mathew, C. Weis, N. Wehn.
IEEE Conference Design, Automation and Test in Europe (DATE 2018), Dresden, Germany.
DOI: 10.23919/DATE.2018.8342249
March, 2018.

[C13] Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators
Andrés Tomás, Rafael Rodríguez-Sánchez, Sandra Catalán, Rocío Carratalá-Sáez, Enrique S. Quintana-Ortí
The 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM’18), Vienna, Austria
DOI: 10.1145/3178442.3178448
24-28 February 2018

[C12] Reducing precision in floating-point representation and arithmetic: Impact on PageRank and BLSTM
F. Scheidegger, A. C. I. Malossi, C. Bekas, L. Benini
4th Workshop On Approximate Computjing (WAPCO 2018), Manchester, England
DOI: Not published
22 January 2018

[C11] A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing
F. Scheidegger, A. C. I. Malossi, C. Bekas, L. Benini
4th Workshop On Approximate Computjing (WAPCO 2018), Manchester, England
DOI: 10.1109/ISCAS.2018.8351816
22 January 2018

[C10] Integrating DRAM Power-Down Modes in gem5 and Quantifying their Impact
R. Jagtap, M. Jung, W. Elsasser, C. Weis, A. Hansson, N. Wehn.
International Symposium on Memory Systems (MEMSYS 2017), Washington, DC, USA.
DOI: 10.1145/3132402.3132444
October, 2017
Best Paper Award

[C9] Using Run-Time Reverse-Engineering to Optimize DRAM Refresh
D. M. Mathew, É. F. Zulian, M. Jung, K. Kraft, C. Weis, B. Jacob, N. Wehn.
International Symposium on Memory Systems (MEMSYS 2017), Washington, DC, USA.
DOI: 10.1145/3132402.3132419
October, 2017

[C8] HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA
Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini
Computer Architecture Research with RISC-V Workshop (CARRV’ 17), Boston, MA, USA
DOI: 10.3929/ethz-b-000219249
14 Oct. 2017

[C7] Approximate DIV and SQRT instructions for the RISC-VISA: An efficiency vs Accuracy Analysis
L. Li, M. Gautschi, L. Benini
2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), Thessaloniki, Greece
DOI: 10.1109/PATMOS.2017.8106987
25-27 Sept. 2017

[C6] Balanced CSR sparse matrix-vector product on graphics processors
Lukas Cavigelli, Philippe Degen, and Luca Benini
11th International Conference on Distributed Smart Cameras (ICDSC 2017)
Doi: 10.1145/3131885.3131906
Sept. 2017

[C5] Balanced CSR sparse matrix-vector product on graphics processors
Goran Flegar, Enrique S. Quintana-Orti
23rd International Conference on Parallel and Distributed Computing (Euro-Par 2017)
Doi: 10.1007/978-3-319-64203-1_50
28 August 2017

[C4] Impact of Temporal Subsampling on Accuracy and Performance in Practical Video Classification
F. Scheidegger, L. Cavigelli, M. Schaffner, A. C. I. Malossi, C. Bekas, L. Benini
25th European Signal Processing Conference (EUSIPCO-2017)
DOI: 10.23919/EUSIPCO.2017.8081357
28 August 2017

[C3] A sub-10mW real-time implementation for EMG hand gesture recognition based on a multi-core biomedical SoC
S. Benatti, G. Rovere, J. Boesser, F. Montagna, E. Farella, F. Glaser, P. Schoenle, T. Burger, S. Fateh, Q, Huang, L. Benini
7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI 2017), Vieste (FG), Italy
DOI: 10.1109/IWASI.2017.7974234
15-16 June 2017

[C2] CAS-CNN: A deep convolutional neural network for image compression artifact suppression
Lukas Cavigelli; Pascal Hager; Luca Benini
2017 International Joint Conference on Neural Networks (IJCNN)
DOI: 10.1109/IJCNN.2017.7965927
14 – 19 May 2017

[C1] Variable-Size Batched LU for Small Matrices and its Integration into Block-Jacobi Preconditioning
Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Orti
46th International Conference on Parallel Processing (ICPP-2017)
DOI: 10.1109/ICPP.2017.18
15 March 2017

Talks

[T18] Near-Memory Transprecision Accelerators using CAPI/OpenCAPI and SNAP
Dionysios Diamantopoulos
OpenPOWER Summit Europe, Lyon, France
31 Oct. – 1 Nov. 2019

[T17] Transprecision Computing For Energy Efficiency
Cristiano Malossi
Workshop on Approximate Computing (AxC), DATE Conference 2019, Florence, Italy
March 2019

[T16]PULP: An Open-Source RISC-V based Multi-Core Platform for In-Sensor Analytics
Davide Rossi
Workshop on Open Source Design Automation (OSDA) at DATE 2019, Florence, Italy
29 Mar 2019

[T15] Mr.Wolf: A RISC-V Parallel Ultra-Low-Power SoC for IoT Edge Processing
Davide Rossi
Multicore Day 2018, Kista, Sweden
26 Nov 2018

[T14] PULP: A Transprecision Multi-Core Platform for Micropower In-Sensor Analytics
Davide Rossi
The 16th International System-on-Chip (SoC) Conference, Exhibit, and Workshops, Irvine, CA, USA
17 Oct 2018

[T13] Hardware and Software Support for Transprecision Computing on Ultra-Low-Power Embedded Systems
Giuseppe Tagliavini
3rd Italian Workshop on Embedded Systems (IWES 2018), Siena, Italy
13 Oct 2018

[T12] Transprecision Computing
Dionysios Diamantopoulos
OpenPOWER Summit Europe, RAI Center Amsterdam
3 Oct 2018

[T11] GAP-8: A RISC-V SoC for AI at the Edge of the IoT
Eric Flamand
ASAP 2018, Milan, Italy
12 July 2018

[T10] Adaptive Mixed Precision Kernel Recursive Least Squares
Junkyu Lee
Adaptive Many-Core Architectures and Systems workshop, York, England
15 June 2018

[T9]Quentin: A Near-Threshold SoC for Energy-Efficient IoT End-Nodes in 22nm FDX Technology
Davide Rossi
Date 2018, Dresden, Germany
23 March 2018

[T8] Ultra-Low-Power Digital Architectures for the Internet of Things
Davide Rossi
ISQED, Santa Clara, CA, USA
3 March 2018

[T7] An Open Source RISC-V Based Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics
Davide Rossi
Embedded World 2018, Nuremberg, Germany
27 Feb 2018

[T6] Exploiting Numerical Properties towards Energy Saving: A Case Study
Junkyu Lee
The 4th Workshop on Approximate Computing, WAPCO 2018, Hipeac, Manchester, York, England
January 2018

[T5] Sub-pJ per Operation Scalable Computing with the PULP Platform
Davide Rossi
MCC2017, Uppsala, Sweden
November 30 2017

[T4] Thermodynamics and Statistical Mechanics of Small Systems
Luca Gammaitoni
Thermodynamics and Statistical Mechanics of Small Systems workshop, Roma, Italy
18-20 Sep 2017

[T3] Transprecision Computing Towards Energy Saving
JunKyu Lee
Bio4Comp Workshop, Dresden, Germany
13 September 2017

[T2] Fundamental energy costs for memory preservation
I. Neri, D. Chiuchiù, M. López Suárez, C. Diamantini and L. Gammaitoni
Micro Energy 2017, Gubbio, Italy
July 3-7 2017

[T1] Smart Integrated Microsystems for the IoT: The Energy Efficiency Challenge
Davide Rossi
WEEE Conference 2017, Dresden, Germany
June 2017

Posters Presentations

[P1] OPRECOMP Poster Presentations
HIPEAC Conference, 22-24 January 2018, Manchester, UK
International CAE Conference, 06-07 November 2017, Vicenza, Italy

[P2] “Energy-Efficient Transprecision Techniques for Iterative Refinement” Poster Presentations
SuperComputing 2017 , 12-27 November 2017, Denver, CO, USA

Dissemination

Scientific Publications

Conference Proceedings

Talks

Posters Presentations