Reduces CPU instruction count of conv_gen function by ~30%. Improves performance of convolution operation by 20-25%.