Showing 101 - 150 of 1951
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Monaural Speech Enhancement Using Intra-Spectral Recurrent Layers In The Magnitude And Phase Responses
Speech enhancement has greatly benefited from deep learning. Currently, the best performing deep architectures use long short-term memory (LSTM) recurrent neural networks (RNNs) to model short and long temporal dependencies. These approaches, however, und
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Frame-Level Mmi As A Sequence Discriminative Training Criterion For Lvcsr
In this work we present frame-level maximum mutual information (MMI) as a novel sequence discriminative training criterion for hybrid HMM-DNN acoustic models. Compared to the standard, sequence-level MMI criterion we show that frame-level MMI has increase
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Self-Attentive Sentimental Sentence Embedding For Sentiment Analysis
We propose the use of a word-level sentiment bidirectional LSTM in tandem with the self-attention mechanism for sentence-level sentiment prediction. In addition to the pro- posed model, we also present a finance report dataset for sentence-level financial
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fast And High-Quality Singing Voice Synthesis System Based On Convolutional Neural Networks
The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing v
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Exploring A Zero-Order Direct Hmm Based On Latent Attention For Automatic Speech Recognition
In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Label Sound Event Retrieval Using A Deep Learning-Based Siamese Structure With A Pairwise Presence Matrix
Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval, is a type of content-based search aiming at finding audio samples, similar to an audio query based on the
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Multi-Task Learning Via Sa-Fpn And Ej-Head
As a concise framework, Mask R-CNN achieves promising performance in object detection and instance segmentation. However, there is room for improvement in two aspects. One is that performing multi-task prediction needs more credible feature extraction and
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Zero-Crossing Precoding With Maximum Distance To The Decision Threshold For Channels With 1-Bit Quantization And Oversampling
Low-resolution devices are promising for systems that demand low energy consumption and low complexity as required in IoT systems. In this study, we propose a novel waveform for bandlimited channels with 1-bit quantization and oversampling at the receiver
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
End-To-End Architectures For Asr-Free Spoken Language Understanding
Spoken Language Understanding (SLU) is the problem of extracting the meaning from speech utterances. It is typically addressed as a two-step problem, where an Automatic Speech Recognition (ASR) model is employed to convert speech into text, followed by a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Upscaling Vector Approximate Message Passing
In this paper we consider the problem of recovering a signal x of size N from noisy and compressed measurements y = A x + w of size M, where the measurement matrix A is right-orthogonally invariant (ROI). Vector Approximate Message Passing (VAMP) demonstr
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Selection-Channel-Aware Reverse Jpeg Compatibility For Highly Reliable Steganalysis Of Jpeg Images
This paper deeply studies the principle of the recent reverse JPEG compatibility attack [1]. This analysis allows us to cast the problem of hidden data detection in DCT coefficients within hypothesis testing theory. The optimal LR test, thought efficient,
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Novel Moving Sparse Array Geometry With Increased Degrees Of Freedom
In this paper, we propose a novel moving sparse array geometry named dilated arrays (DAs) by extending the dilation of nested arrays to other linear array structures. The theoretical analysis of dilation to other arrays is not straightforward since the re
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Robust Transmission Over Channels With Channel Uncertainty: An Algorithmic Perspective
The availability and quality of channel state information heavily influences the performance of wireless communication systems. For perfect channel knowledge, optimal signal processing and coding schemes are well studied and often closed-form solutions ar
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Effective Approximate Maximum Likelihood Estimation Of Angles Of Arrival For Non-Coherent Sub-Arrays
We consider the problem of estimating the angles of arrival (AOAs) of multiple sources from a single snapshot obtained by a set of non-coherent sub-arrays, i.e., while the antenna elements in each sub-array are coherent, each sub-array observes a differen
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Mental Fatigue Prediction From Multi-Channel Ecog Signal
Early detection of mental fatigue and changes in vigilance could be used to initiate neurostimulation to treat patients suffering from brain injury and mental disorders. In this study, we analyzed electrocorticography (ECoG) signals chronically recorded f
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Knowledge Distillation And Random Erasing Data Augmentation For Text-Dependent Speaker Verification
This paper explores the Knowledge Distillation (KD) approach and a data augmentation technique to improve the generalization ability and robustness of text-dependent speaker verification (SV) systems. The KD method consists of two neural networks, known a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Unsupervised Change Detection For Multimodal Remote Sensing Images Via Coupled Dictionary Learning And Sparse Coding
Archetypal scenarios for change detection generally consider two images acquired through sensors of the same modality. The resolution dissimilarity is often bypassed though a simple preprocessing, applied independently on each image to bring them to the s
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Acoustic Scene Classification Using Deep Residual Networks With Late Fusion Of Separated High And Low Frequency Paths
We investigate the problem of acoustic-scene classification, using a deep residual network applied to log-mel spectrograms complemented by log-mel deltas and delta-deltas.~We design the network to take into account that the temporal and frequency axes in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Detection Of Speech Events And Speaker Characteristics Through Photo-Plethysmographic Signal Neural Processing
The use of photoplethysmogram signal (PPG) for heart and sleep monitoring is commonly found nowadays in smartphones and wrist wearables. Besides common usages, it has been proposed and reported that person information can be extracted from PPG for other u
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improved End-To-End Spoken Utterance Classification With A Self-Attention Acoustic Classifier
While human language provides a natural interface for human-machine communication, there are several challenges concerning extracting the intents of a speaker when interacting with a virtual agent, especially when the speaker is in a noisy acoustic enviro
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
High Dynamic Range Imaging Using Deep Image Priors
Traditionally, dynamic range enhancement for images has involved a combination of contrast improvement (via gamma correction or histogram equalization) and a denoising operation to reduce the effects of photon noise. More recently, modulo-imaging methods
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Low-Frequency Compensated Synthetic Impulse Responses For Improved Far-Field Speech Recognition
We propose a method for generating low-frequency compensated synthetic impulse responses that improve the performance of far-field speech recognition systems trained on artificially augmented datasets. We design linear-phase filters that adapt the simulat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Channel Charting: An Euclidean Distance Matrix Completion Perspective
Channel charting (CC) is an emerging machine learning framework that aims at learning lower-dimensional representations of the radio geometry from collected channel state information (CSI) in an area of interest, such that spatial relations of the represe
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Libri-Adapt: A New Speech Dataset For Unsupervised Domain Adaptation
This paper introduces a new dataset, Libri-Adapt, to support unsupervised domain adaptation research on speech recognition models. Built on top of the LibriSpeech corpus, Libri-Adapt contains 7200 hours of English speech recorded on mobile and embedded-sc
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Label Reuse For Efficient Semi-Supervised Learning
In this paper, we propose a new learning strategy for semi-supervised deep learning algorithms, called label reuse, aiming to significantly reduce the expensive computational cost of pseudo label generation and the like for each unlabeled training instanc
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
End-To-End Non-Negative Autoencoders For Sound Source Separation
Discriminative models for source separation have recently been shown to produce impressive results. However, when operating on sources outside of the training set, these models can not perform as well and are cumbersome to update. Classical methods like N
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Retrieving Vocal-Tract Resonance And Anti-Resonance From High-Pitched Vowels Using A Rahmonic Subtraction Technique
Vocal tract resonances give rise to core spectral information of speech signals. Linear prediction and cepstral methods are widely used for this purpose. However, both approaches are prone to fail as the fundamental frequency (F0) rises. In this study, a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Passive Intelligent Surface Assisted Mimo Powered Sustainable Iot
Lately, Passive Intelligent Surfaces (PIS) are being recognized to play an important role in meeting the timely demand of low-cost green sustainable Internet of Things (IoT). In this paper, we focus on maximizing the sum received power among the energy ha
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
End-To-End Code-Switching Tts With Cross-Lingual Language Model
Code-switching text-to-speech (TTS) aims to enable a system to speak two languages with a single voice and in the same utterance. In this paper, we propose to incorporate cross-lingual word embedding into an end-to-end TTS system, to improve the voice ren
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Lai-Net: Local-Ancestry Inference With Neural Networks
Local-ancestry inference (LAI), also referred to as ancestry deconvolution, provides high-resolution ancestry estimation along the human genome. In both research and industry, LAI is emerging as a critical step in personalized DNA sequence analysis with a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Atomic Norm Based Localization Of Far-Field And Near-Field Signals With Generalized Symmetric Arrays
Most localization methods for mixed far-field (FF) and nearfield (NF) sources are based on uniform linear array (ULA) rather than sparse linear array (SLA). In this paper, we propose a localization method for mixed FF and NF sources based on the generaliz
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Training Keyword Spotters With Limited And Synthesized Speech Data
With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obta
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Classify And Explain: An Interpretable Convolutional Neural Network For Lung Cancer Diagnosis
The deep network-based computer-aided diagnosis systems have encountered many difficulties in practical applications because of its "black box" feature. The crux of the problem is that these models should be explainable ? the model should provide doctors
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Foreground Signature Extraction For An Intimate Mixing Model In Hyperspectral Image Classification
The hyperspectral unmixing problem arises in remote sensing, chemometrics, and biomedical engineering applications. The spectral signature of a single pixel in a hyperspectral cube can be represented as a non-negative combination of non-negative signature
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Hierarchical Tracker For Multi-Domain Dialogue State Tracking
The goal of Dialogue State Tracking (DST) is to estimate the current dialogue state given all the preceding conversation. Due to the increased number of state candidates, data sparsity problem is still a major hurdle for multi-domain DST. Existing methods
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Frequency Diverse Array Radar: A Closed-Form Solution To Design Weights For Desired Beampattern
In contrast to phased-array radar, frequency-diverse-array (FDA) radar transmits signals of linearly increasing frequencies across the array. As a consequence, the beampattern of an FDA radar becomes range, angle, and time dependent, which is different fr
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
M-Estimators Of Scatter With Eigenvalue Shrinkage
A popular regularized (shrinkage) covariance estimator is the shrinkage sample covariance matrix (SCM) which shares the same set of eigenvectors as the SCM but shrinks its eigenvalues toward its grand mean. In this paper, a more general approach is consid
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Speech Intelligibility Enhancement By Equalization For In-Car Applications
In this paper, we propose a speech intelligibility enhancement method for typical in-car applications in noisy environments. While traditional speech enhancement algorithms aim at increasing the Signal to Noise Ratio (SNR), the goal here is to increase in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Content Vs Context: How About "walking Hand-In-Hand" For Image Clustering?
Image clustering has been one of the most important issues in the field of pattern recognition. However, most of existing methods only focus on utilizing either content or context information of images, failing to consider both of them. In fact, the power
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Weakly Supervised Semantic Segmentation For Remote Sensing Hyperspectral Imaging
This paper studies the problem of training a semantic segmentation neural network with weak annotations, in order to be applied in aerial vegetation images from Teide National Park. It proposes a Deep Seeded Region Growing system which consists on trainin
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
On End-To-End Multi-Channel Time Domain Speech Separation In Reverberant Environments
This paper introduces a new method for multi-channel time domain speech separation in reverberant environments. A fully-convolutional neural network structure has been used to directly separate speech from multiple microphone recordings, with no need of c
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Portfolio Cuts: A Graph-Theoretic Framework To Diversification
Investment returns naturally reside on irregular domains, however, standard multivariate portfolio optimization methods are agnostic to data structure. To this end, we investigate ways for domain knowledge to be conveniently incorporated into the analysis
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Hi-Mia : A Far-Field Text-Dependent Speaker Verification Database And The Baselines
This paper presents a large far-field text-dependent speaker verification database named HI-MIA. We aim to meet the data requirement for far-field microphone array based speaker verification since most of the publicly available databases are single channe
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Characterization Of A Snapshot Fourier Transform Imagingspectrometer Based On An Array Of Fabry-Perot Interferometers
This study focuses on a novel snapshot Fourier Transform imaging spectrometer based on an array of Fabry-Perot interferometers. This device fully relies on signal processing in order to provide intelligible outputs and thus requires a precise characterisa
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Maximally Energy-Concentrated Differential Window For Phase-Aware Signal Processing Using Instantaneous Frequency
The short-time Fourier transform (STFT) is widely employed in nonstationary signal analysis, whose property depends on window functions. Instantaneous frequency in STFT, the time-derivative of phase, is recently applied to many applications including spec
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deep Multi-Region Hashing
Hashing has been widely used for large-scale approximate nearest neighbors retrieval own to its high efficiency. In the existing hashing methods, deep supervised hashing methods have achieved the best performance by utilizing the semantic labels on data w