[ad_1]
A real-world case study of performance optimization in Numpy
This is a relatively brief article. In it, I will use a real-world scenario as an example to explain how to use Numexpr expressions in multidimensional Numpy arrays to achieve substantial performance improvements.
There aren’t many articles explaining how to use Numexpr in multidimensional Numpy arrays and how to use Numexpr expressions, so I hope this one will help you.
Recently, while reviewing some of my old work, I stumbled upon this piece of code:
def predict(X, w, b):
z = np.dot(X, w)
y_hat = sigmoid(z)
y_pred = np.zeros((y_hat.shape[0], 1))for i in range(y_hat.shape[0]):
if y_hat[i, 0] < 0.5:
y_pred[i, 0] = 0
else:
y_pred[i, 0] = 1
return y_pred
This code transforms prediction results from probabilities to classification results of 0 or 1 in the logistic regression model of machine learning.
But heavens, who would use a for loop
to iterate over Numpy ndarray?
You can foresee that when the data reaches a certain amount, it will not only occupy a lot of memory, but the performance will also be inferior.
That’s right, the person who wrote this code was me when I was younger.
With a sense of responsibility, I plan to rewrite this code with the Numexpr library today.
Along the way, I will show you how to use Numexpr and Numexpr’s where
expression in multidimensional Numpy arrays to achieve significant performance improvements.
If you are not familiar with the basic usage of Numexpr, you can refer to this article:
Source link