Scientific Publications

[J21] Towards a modular precision ecosystem for high performance computing
H. Anzt, G. Flegar, T. Grützmacher, E. S. Quintana-Ortí
To be published in Int. J. of High Performance Computing Applications 
2019
DOI:

[J20] Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
R. Andri, L. Cavigelli, D. Rossi and L. Benini
IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
June 2019
DOI: 10.1109/JETCAS.2019.2905654

[J19] Online Learning and Classification of EMG-Based Gestures on a Parallel Ultra-Low Power Platform Using Hyperdimensional Computing
S. Benatti, F. Montagna, V. Kartsch, A. Rahimi, D. Rossi and L. Benini
IEEE Transactions on Biomedical Circuits and Systems  
June 2019
DOI: 10.1109/TBCAS.2019.2914476

[J18] Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing
A. Pullini, D. Rossi, I. Loi, G. Tagliavini and L. Benini
IEEE Journal of Solid-State Circuits  
15 May, 2019
DOI: 10.1109/JSSC.2019.2912307

[J17] Significance-Driven Data Truncation for Preventing Timing Failures
Ioannis Tsiokanos, Lev Mukhanov, Dimitrios S. Nikolopoulos, Georgios Karakonstantis
IEEE Transactions on Device and Materials Reliability  
12 February, 2019
DOI: 10.1109/TDMR.2019.2898949

[J16] Dynamic look-ahead in the reduction to band form for the Singular Value Decomposition
A. E. Tomás, R. Rodríguez-Sánchez, S. Catalán, R. Carratalá, E. S. Quintana-Ortí
Parallel Computing, Volume 81  
January 2019
DOI: 10.1109/TDMR.2019.2898949

[J15] NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs
Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, and Luca Benini
ACM Trans. Reconfigurable Technol. Syst. 
December, 2018
DOI: 10.1145/3284357

[J14] FlexFloat: A Software Library for Transprecision Computing
Giuseppe Tagliavini, Andrea Marongiu, Luca Benini
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 
December, 2018
DOI: 10.1109/TCAD.2018.2883902

[J13] A sensor fusion approach for drowsiness detection in wearable ultra-low-power systems
Victor Javier Kartsch, Simone Benatti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini
Science Direct, Information Fusion, Volume 43, Pages 66-76 
September, 2018
DOI: 10.1016/j.inffus.2017.11.005

[J12] Energy-Efficient Iterative Refinement using Dynamic Precision
JunKyu Lee, Hans Vandierendonck, Mahwish Arif, Gregory D. Peterson, Dimitrios S. Nikolopoulos
IEEE Journal of Emerging and Selected Topics in Circuits and Systems  
25 June, 2018
DOI: 10.1109/JETCAS.2018.2850665

[J11] An Energy-Efficient Integrated Programmable Array Accelerator and Compilation flow for Near-Sensor Ultra-low Power Processing
S. Das, K. J. M. Martin, D. Rossi, P. Coussy and L. Benini
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, IEEE, 2018, pp.1 – 1.  
8 May, 2018
DOI: 10.1109/TCAD.2018.2834397

[J10] The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores
I. Loi, A. Capotondi, D. Rossi, A. Marongiu and L. Benini
in IEEE Transactions on Parallel and Distributed Systems, (Volume: 29 , Issue: 2) 
April-June, 2018
DOI: 10.1109/TPDS.2017.2752706

[J9] Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes
E. Azarkhish, D. Rossi, I. Loi and L. Benini
in IEEE Transactions on Parallel and Distributed Systems, (Volume: 29 , Issue: 2) 
1 Feb, 2018
DOI: 10.1109/TPDS.2017.2752706

[J8] A Machine Learning Approach for Automated Wide-Range Frequency Tagging Analysis in Embedded Neuromonitoring Systems
Fabio Montagna, Marco Buiatti, Simone Benatti, Davide Rossi, Elisabetta Farella, Luca Benini
Science Direct (Parallel Computing)
22 June, 2017
DOI:10.1016/j.ymeth.2017.06.019

[J7] Adaptive precision in block-Jacobi preconditionning for iterative sparse linear system solvers 
Hartwig Anzt, Jack Dongarra, Goran Flegar, Nicholas Higham, Enrique S. Quintana-Ortí
Concurrency and Computation (Practice and Experience)
12 March 2018
DOI:
10.1002/cpe.4460

[J6] Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD
Rafael Rodríguez-Sánchez, Sandra Catalán, Jose R. Herrero, Enrique S. Quintana-Ortí, Andrés Tomás
Numerical Algorithms
March 2018
DOI:
10.1007/s11075-018-0500-8

[J5] Cost of remembering a bit of information
D. Chiuchiù, M. López-Suárez, I. Neri, M.C. Diamantini, L. Gammaitoni
Physical Review A
8 May, 2018
DOI:  10.1103/PhysRevA.97.052108

[J4] Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditionning on Graphics Processors 
Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-OrtiM
Parallel Computing
DOI:  10.1016/j.parco.2017.12.006

22 January, 2018

[J3] Flexible, Scalable and Energy Efficient Bio-Signals on the PULP Platform: A Case Study on Seizure Detection
Fabio Montagna, Simone Benatti, Davide Rossi
Journal of Low Power Electronics and Applications 
2017, 7(2), 16
DOI: 10.3390/jlpea7020016

[J2] An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
Francesco Conti, Robert Schilling, Pasquale D. Schiavone, Antonio Pullini, Davide Rossi, Frank K. G¨urkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini
IEEE Transactions on Circuits and Systems I: Regular Papers Volume: PPIssue: 99 )
13 May 2017
DOI: 10.1109/TCSI.2017.2698019

[J1] A Prosthetic Hand Body Area Controller Based on Efficient Pattern Recognition Control Strategies.
S. Benatti, B. Milosevic, E. Farella, E. Gruppioni, L. Benini
Sensors 2017, Basel, Switzerland
15 April 2017
DOI: 10.3390/s17040869

Conferences

[C40] Constrained deep neural network architecture search for IoT devices accounting hardware calibration
Florian Scheidegger, Luca Benini, Costas Bekas, Cristiano Malossi
Submitted
DOI:

[C39] NARMADA: Near-memory horizontal diffusion accelerator for scalable stencil computations,
Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner
Accepted for presentation at FPL2019 Barcelona, Spain
DOI:
September 9-11 2019

[C38] Cholesky and Gram-Schmidt orthogonalization for tall-and-skinny QR factorization on graphics processors.
A. Tomás, E. S. Quintana-Ortí.
Accepted for presentation at Euro-Par 2019 Barcelona, Spain
DOI:
August 26-30 2019

[C37] HelmGemm: Managing GPUs and FPGAs for transprecision GEMM workloads in containerized environments
Dionysios Diamantopoulos, Christoph Hagleitner
Accepted for presentation at ASAP2019 Cornell Tech, New York
DOI:
July 15-17 2019

[C36] RRAMSpec: A Design Space Exploration Framework for High Density Resistive RAM
D. M. Mathew, A. Chinazzo, C. Weis, M. Jung, B. Giraud, P. Vivet, A. Levisse, N. Wehn
Accepted for presentation at SAMOS2019 Samos Island, Greece
DOI:
July 7-11 2019

[C35] A Lean, Low Power, Low Latency DRAM Memory Controller for Transprecision Computing
C. Sudarshan, J. Lappas, C. Weis, D. M. Mathew, M. Jung, N. Wehn
Accepted for presentation at SAMOS2019 Samos Island, Greece
DOI:
July 7-11 2019

[C34] Iterative high-order stencils for transprecise co-processors
Gagandeep Singh, Dionysios Diamantopoulos, Sander Stuijk, Henk Corporaal, Christoph Hagleitner
Accepted for presentation at SAMOS2019 Samos Island, Greece
DOI:
July 7-11 2019

[C31] An In-DRAM Neural Network Processing Engine
C. Sudarshan, J. Lappas, M. M. Ghaffar, V. Rybalkin, C. Weis, M. Jung, N. Wehn
ISCAS , Sapporo, Japan.
DOI: 10.1109/ISCAS.2019.8702458
May, 2019

[C30] Coherently Attached Programmable Near-Memory Acceleration Platform and its application to Stencil Processing
Jan van Lunteren, Ronald Luijten, Dionysios Diamantapoulos, Florian Auernhammer, Christoph Hagleitner, Lorenzo Chelini, Stefano Corda, Gagandeep Singh
Date 2019 , Florence, Italy
DOI: 10.23919/DATE.2019.8715088
28-29 March 2019

[C29] Low-Power Variation-Aware Cores based on Dynamic Data-Dependent Bitwidth Truncation
I. Tsiokanos, L. Mukhanov and G. Karakonstantis
Date 2019 , Florence, Italy
DOI: 10.23919/DATE.2019.8714942
28-29 March 2019

[C28] Design and Evaluation of SmallFloat SIMD extensions to the RISC-V ISA
G. Tagliavini, S. Mach, D. Rossi, A. Marongiu, D. Rossi
Date 2019 , Florence, Italy
DOI: 10.23919/DATE.2019.8714897
28-29 March 2019

[C27] A System-level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping
Dionysios Diamantopoulos,  Christoph Hagleitner
FTP’18, Naha, Okinawa, Japan
DOI:
10-14 December 2018

[C26]High-performance GPU implementation of PageRank with reduced precision based on mantissa segmentation
T. Grützmacher, H. Anzt, F. Scheidegger, E. S. Quintana-Ortí.
2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3) ,Dallas, TX, USA
DOI: 10.1109/IA3.2018.00015
12 November 2018

[C25] Quantized NNs as the Definitive Solution for Inference on Low-Power ARM MCUs
M. Rusci, A. Capotondi, F. Conti, L. Benini
Accepted for publication at Embedded System Week 2018 (ESWEEK 2018) , Torino, Italy
DOI: 10.1109/CODESISSS.2018.8525915
30 Sept – 5 Oct 2018

[C24] Variation-aware pipelined cores through path shaping and dynamic cycle adjustment: Case study on a floating-point unit
I. Tsiokanos, L. Mukhanov, D. S. Nikolopoulos, and G. Karakonstantis
ISLPED 2018 , Seatle, WA, USA
DOI: https://dl.acm.org/citation.cfm?id=3218617
23-25 July 2018

[C23] Minimization of timing failures in pipelined designs via path shaping and operand truncation
I. Tsiokanos, L. Mukhanov, D. S. Nikolopoulos, and G. Karakonstantis
IOLTS 2018 , latja d’Aro, Costa Brava, Spain
DOI: 10.1109/IOLTS.2018.8474084
2-4 July 2018

[C22] Residual replacement in mixed-precision iterative refinement for sparse linear systems
H. Anzt, G. Flegar, V. Novakovic, E. S. Quintana-Ortí, A. Tomás.
ISC Workshops 2018 , Frankfurt/Main, Germany
28 June 2018
DOI: 10.1007/978-3-030-02465-9_39

[C21] The Role of Memories in Transprecision Computing
C. Weis, M. Jung, É. F. Zulian, C. Sudarshan, D. M. Mathew, N. Wehn
IEEE International Symposium on Circuits & Systems (ISCAS 2018), Florence, Italy
DOI: 10.1109/ISCAS.2018.8351768
27-30 May 2018

[C20] A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing
Stefan Mach, Davide Rossi, Giuseppe Tagliavini, Andrea Marongiu, Luca Benini
IEEE International Symposium on Circuits & Systems (ISCAS 2018), Florence, Italy
DOI:  10.1109/ISCAS.2018.8351816
27-30 May 2018

[C19] A Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics
S. Das, K. J. M. Martin, P. Coussy and D. Rossi
IEEE International Symposium on Circuits & Systems (ISCAS 2018), Florence, Italy
DOI:   10.1109/ISCAS.2018.8351749
27-30 May 2018

[C18] ecTALK: Energy efficient coherent transprecision accelerators — The bidirectional long short-term memory neural network case
Dionysios Diamantopoulos
“https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8369044”>2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)  
18-20 April, 2018
DOI: 10.1109/CoolChips.2018.8373077

[C17] Fast Blocking of Houdeholder Reflectors on Graphics Processors
Andrés Tomás, Enrique S. Quintana-Ortí
26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2018)
DOI: 10.1109/PDP2018.2018.00068
21-23 March 2018

[C16] Extending the POWER Architecture with Transprecision Co-Processors
Heiner Giefers, Dionysios Diamantopoulos
International Symposium on Circuits and Systems (ISCAS 2018), Florence, Italy
DOI: 10.1109/ISCAS.2018.8351755
March 2018

[C15] The Transprecision Computing Paradigm: Concept, Design, and Applications
A. Cristiano I. Malossi, Michael Schaffner, Anca Molnos, Luca Gammaitoni, Giuseppe Tagliavini, Andrew Emerson, Andrés Tomás, Dimitrios S. Nikolopoulos, Eric Flamand, Norbert Wehn
Accepted for publication, IEEE Conference Design, Automation and Test in Europe (DATE 2018),  Dresden, Germany.
DOI:  10.23919/DATE.2018.8342176
March 2018

[C14] A Transprecision Floating-Point Platform for Ultra-Low Power Computing
Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, Luca Benini
IEEE Conference Design, Automation and Test in Europe (DATE),  Dresden, Germany.
DOI: 10.23919/DATE.2018.8342167
March, 2018.

[C13] An Analysis on Retention Error Behavior and Power Consumption of Recent DDR4 DRAMs
D. M. Mathew, M. Schultheis, C. Rheinländer, C. Sudarshan, M. Jung, C. Weis, N. Wehn.
Accepted for publication, IEEE Conference Design, Automation and Test in Europe (DATE),  Dresden, Germany.
DOI: 10.23919/DATE.2018.8342023
March, 2018.

[C12] Improving the Error Behavior of DRAM by Exploiting its Z-Channel Property
K. Kraft, M. Jung, C. Sudarshan, D. M. Mathew, C. Weis, N. Wehn.
IEEE Conference Design, Automation and Test in Europe (DATE 2018),  Dresden, Germany.
DOI:  10.23919/DATE.2018.8342249
March, 2018.

[C11] Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators
Andrés Tomás, Rafael Rodríguez-Sánchez, Sandra Catalán, Rocío Carratalá-Sáez, Enrique S. Quintana-Ortí
The 9th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM’18), Vienna, Austria
DOI: 10.1145/3178442.3178448
24-28 February 2018

[C10]  Reducing precision in gloating-point representation and arithmetic: Impact on PageRank and BLSTM
F. Scheidegger, A. C. I. Malossi, C. Bekas, L. Benini 
4th Workshop On Approximate Computjing (WAPCO 2018), Manchester, England
DOI:  Not published
22 January 2018

[C9] A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing
F. Scheidegger, A. C. I. Malossi, C. Bekas, L. Benini 
4th Workshop On Approximate Computjing (WAPCO 2018), Manchester, England
DOI:  Not Published
22 January 2018

[C8] Integrating DRAM Power-Down Modes in gem5 and Quantifying their Impact
R. Jagtap, M. Jung, W. Elsasser, C. Weis, A. Hansson, N. Wehn.
International Symposium on Memory Systems (MEMSYS 2017), Washington, DC, USA.[Nominated for best paper award] DOI: 10.1145/3132402.3132444
October, 2017

[C7] Using Run-Time Reverse-Engineering to Optimize DRAM Refresh
D. M. Mathew, É. F. Zulian, M. Jung, K. Kraft, C. Weis, B. Jacob, N. Wehn.
International Symposium on Memory Systems (MEMSYS 2017), Washington, DC, USA.
DOI: 10.1145/3132402.3132419
October, 2017

[C6] Approximate DIV and SQRT instructions for the RISC-VISA: An efficiency vs Accuracy Analysis
L. Li, M. Gautschi, L. Benini
2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), Thessaloniki, Greece
DOI:  10.1109/PATMOS.2017.8106987
25-27 Sept. 2017

[C5] Balanced CSR sparse matrix-vector product on graphics processors
Goran Flegar, Enrique S. Quintana-Orti
23rd International Conference on Parallel and Distributed Computing (Euro-Par 2017)
Doi: 10.1007/978-3-319-64203-1_50
28 August 2017

[C4] Impact of Temporal Subsampling on Accuracy and Performance in Practical Video Classification
F. Scheidegger, L. Cavigelli, M. Schaffner, A. C. I. Malossi, C. Bekas, L. Benini
25th European Signal Processing Conference (EUSIPCO-2017)
DOI: 10.23919/EUSIPCO.2017.8081357
28 August 2017

[C3] A sub-10mW real-time implementation for EMG hand gesture recognition based on a multi-core biomedical SoC
S. Benatti, G. Rovere, J. Boesser, F. Montagna, E. Farella, F. Glaser, P. Schoenle, T. Burger, S. Fateh, Q, Huang, L. Benini
7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI 2017), Vieste (FG), Italy
DOI:  10.1109/IWASI.2017.7974234
15-16 June 2017

[C2] Fundamental energy costs for memory preservation
I. Neri, D. Chiuchiù, M. López Suárez, C. Diamantini and L. Gammaitoni
Micro Energy 2017, Gubbio, Italy
July 3-7 2017

[C1] Variable-Size Batched LU for Small Matrices and its Integration into Block-Jacobi Preconditioning
Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Orti
46th International Conference on Parallel Processing (ICPP-2017)
DOI: 10.1109/ICPP.2017.18
15 March 2017

Talks

[T8] Transprecision Computing
Dionysios Diamantopoulos
OpenPOWER Summit Europe, RAI Center Amsterdam
October 3-4 2018

[T7] Adaptive Mixed Precision Kernel Recursive Least Squares
Junkyu Lee
Adaptive Many-Core Architectures and Systems workshop, York, England
June 2018

[T6] Exploiting Numerical Properties towards Energy Saving: A Case Study 
Junkyu Lee
The 4th Workshop on Approximate Computing, WAPCO 2018, Hipeac, Manchester, York, England
January 2018

[T5] Sub-pJ per Operation Scalable Computing with the PULP Platform
Davide Rossi
MCC2017, Uppsala, Sweden
November 30 2017

[T4] Thermodynamics and Statistical Mechanics of Small Systems
Luca Gammaitoni
Thermodynamics and Statistical Mechanics of Small Systems workshop
September 2017

[T3] Transprecision Computing Towards Energy Saving
JunKyu Lee
Bio4Comp Workshop, Dresden, Germany
13 September 2017

[T2] Fundamental energy costs for memory preservation
Davide Rossi
MicroEnergy 2017, Dresden, Germany
July 2017

[T1] Smart Integrated Microsystems for the IoT: The Energy Efficiency Challenge
Davide Rossi
WEEE Conference 2017, Dresden, Germany
June 2017

Posters Presentations

[P1] OPRECOMP Poster Presentations
HIPEAC Conference, 22-24 January 2018, Manchester, UK
International CAE Conference, 06-07 November 2017, Vicenza, Italy

[P2] “Energy-Efficient Transprecision Techniques for Iterative Refinement” Poster Presentations
SuperComputing 2017 , 12-27 November 2017, Denver, CO, USA