More Videos...
 

Online sketching of big categorical data with absent features

Online sketching of big categorical data with absent features With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional vectors has emerged as a task of increasing importance. Relevant issues to address in this context include the sheer volume of data vectors that may consist of categorical (meaning finite-alphabet) features, the typically streaming format of data acquisition, and the possibly absent features. To cope with these challenges, the present paper brings forth a novel rank-regularized maximum likelihood approach that models categorical data as quantized values of analog-amplitude features with low intrinsic dimensionality. This model along with recent online rank regularization advances are leveraged to sketch high-dimensional categorical data `on the fly.’ Simulated tests with synthetic as well as real-world datasets corroborate the merits of the novel scheme relative to state-of-the-art alternatives.

Recent Projects

More +