Matrix Approximation in Data Streams | by Mina Ghashami | Sep, 2023

Matrix Approximation in Data Streams | by Mina Ghashami | Sep, 2023

[ad_1]

Approximate a matrix without having all of its rows

Mina Ghashami
Towards Data Science
Image credit: unsplash.com

Matrix approximation is a heavily studied sub-field in data mining and machine learning. A large set of data analysis tasks rely on obtaining a low rank approximation of matrices. Examples are dimensionality reduction, anomaly detection, data de-noising, clustering, and recommendation systems. In this article, we look into the problem of matrix approximation, and how to compute it when the whole data is not available at hand!

The content of this article is partly taken from my lecture at Stanford -CS246 course. I hope you find it useful. Please find the full content here.

Most data generated on web can be represented as a matrix, where each row of the matrix is a data point. For example, in routers every packet sent across the network is a data point that can be represented as a row in a matrix of all data points. In retail, every purchase made is a row in the matrix of all transactions.

Figure 1: Data as a matrix — Image by the author

At the same time, almost all data generated over web is of a streaming nature; meaning the data is generated by an external source at a rapid rate which we have no control over. Think of all searches users make on Google search engine in a every second. We call this data the streaming data; because just like a stream it pours in.

Some examples of typical streaming web-scale data are as following:

Figure 2: Size of typical streaming web-scale data — Image by the author

Think of streaming data as a matrix A containing n rows in d-dimensional space, where typically n >> d. Often n is in order of billions and increasing.

In streaming model, data arrives at high speed, one row at a time, and algorithms must process the items fast, or they are lost forever.

[ad_2]
Source link

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *