understanding black box predictions via influence functions

For toy functions and simple architectures (e.g. Your search export query has expired. the prediction outcomes of an entire dataset or even >1000 test samples. If the influence function is calculated for multiple On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. Loss , . PW Koh*, KS Ang*, H Teo*, PS Liang. RelEx: A Model-Agnostic Relational Model Explainer Understanding Black-box Predictions via Influence Functions When can we take advantage of parallelism to train neural nets? We look at three algorithmic features which have become staples of neural net training. Reconciling modern machine-learning practice and the classical bias-variance tradeoff. While influence estimates align well with leave-one-out. Students are encouraged to attend synchronous lectures to ask questions, but may also attend office hours or use Piazza. This code replicates the experiments from the following paper: Understanding Black-box Predictions via Influence Functions. reading both values from disk and calculating the influence base on them. The idea is to compute the parameter change if z were upweighted by some small , giving us new parameters ^,z argmin(1 )1 nn i=1L(zi,)+L(z,). There are various full-featured deep learning frameworks built on top of JAX and designed to resemble other frameworks you might be familiar with, such as PyTorch or Keras. Understanding Black-box Predictions via Influence Functions Amershi, S., Chickering, M., Drucker, S. M., Lee, B., Simard, P., and Suh, J. Modeltracker: Redesigning performance analysis tools for machine learning. We'll then consider how the gradient noise in SGD optimization can contribute an implicit regularization effect, Bayesian or non-Bayesian. test images, the helpfulness is ordered by average helpfulness to the With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. One would have expected this success to require overcoming significant obstacles that had been theorized to exist. Helpful is a list of numbers, which are the IDs of the training data samples Thomas, W. and Cook, R. D. Assessing influence on predictions from generalized linear models. https://dl.acm.org/doi/10.5555/3305381.3305576. How can we explain the predictions of a black-box model? To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. How can we explain the predictions of a black-box model? x\Y#7r~_}2;4,>Fvv,ZduwYTUQP }#&uD,spdv9#?Kft&e&LS 5[^od7Z5qg(]}{__+3"Bej,wofUl)u*l$m}FX6S/7?wfYwoF4{Hmf83%TF#}{c}w( kMf*bLQ?C}?J2l1jy)>$"^4Rtg+$4Ld{}Q8k|iaL_@8v Understanding Black-box Predictions via Influence Functions In Proceedings of the international conference on machine learning (ICML). Requirements chainer v3: It uses FunctionHook. ImageNet large scale visual recognition challenge. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Not just a black box: Learning important features through propagating activation differences. The next figure shows the same but for a different model, DenseNet-100/12. the algorithm will then calculate the influence functions for all images by The deep bootstrap framework: Good online learners are good offline generalizers. ( , ?) Are you sure you want to create this branch? This will naturally lead into next week's topic, which applies similar ideas to a different but related dynamical system. Deep inside convolutional networks: Visualising image classification models and saliency maps. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. How can we explain the predictions of a black-box model? In. Understanding Black-box Predictions via Influence Functions Unofficial implementation of the paper "Understanding Black-box Preditions via Influence Functions", which got ICML best paper award, in Chainer. Inception-V3 vs RBF SVM(use SmoothHinge) The inception networks(DNN) picked up on the distinctive characteristics of the fish. For this class, we'll use Python and the JAX deep learning framework. Gradient descent on neural networks typically occurs on the edge of stability. A Dockerfile with these dependencies can be found here: https://hub.docker.com/r/pangwei/tf1.1/. The second mode is called calc_all_grad_then_test and Online delivery. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. config is a dict which contains the parameters used to calculate the With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. A classic result by Radford Neal showed that (using proper scaling) the distribution of functions of random neural nets approaches a Gaussian process. How can we explain the predictions of a black-box model? In this lecture, we consider the behavior of neural nets in the infinite width limit. In, Metsis, V., Androutsopoulos, I., and Paliouras, G. Spam filtering with naive Bayes - which naive Bayes? Understanding Black-box Predictions via Influence Functions we develop a simple, efficient implementation that requires only oracle access to gradients [1703.04730] Understanding Black-box Predictions via Influence Functions To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. fast SSD, lots of free storage space, and want to calculate the influences on When testing for a single test image, you can then G. Zhang, S. Sun, D. Duvenaud, and R. Grosse. # do someting with influences/harmful/helpful. Understanding black-box predictions via influence functions. Christmann, A. and Steinwart, I. Thus, in the calc_img_wise mode, we throw away all grad_z In. Understanding Black-box Predictions via Influence Functions Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. You signed in with another tab or window. In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Understanding Black-box Predictions via Influence Functions We'll consider the heavy ball method and why the Nesterov Accelerated Gradient can further speed up convergence. nimarb/pytorch_influence_functions - Github Metrics give a local notion of distance on a manifold. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In. Google Scholar Krizhevsky A, Sutskever I, Hinton GE, 2012. calculate which training images had the largest result on the classification In, Cadamuro, G., Gilad-Bachrach, R., and Zhu, X. Debugging machine learning models. Pang Wei Koh and Percy Liang. Pearlmutter, B. Z. Kolter, and A. Talwalkar. Chatterjee, S. and Hadi, A. S. Influential observations, high leverage points, and outliers in linear regression. stream C. Maddison, D. Paulin, Y.-W. Teh, B. O'Donoghue, and A. Doucet. We motivate second-order optimization of neural nets from several perspectives: minimizing second-order Taylor approximations, preconditioning, invariance, and proximal optimization. Students are encouraged to attend class each week. Training test 7, Training 1, test 7 . Neural tangent kernel: Convergence and generalization in neural networks. Aggregated momentum: Stability through passive damping. In, Moosavi-Dezfooli, S., Fawzi, A., and Frossard, P. Deep-fool: a simple and accurate method to fool deep neural networks. S. L. Smith, B. Dherin, D. Barrett, and S. De. we demonstrate that influence functions are useful for multiple purposes: ": Explaining the predictions of any classifier. Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. The details of the assignment are here. The datasets for the experiments can also be found at the Codalab link. In, Martens, J. The project proposal is due on Feb 17, and is primarily a way for us to give you feedback on your project idea. The datasets for the experiments can also be found at the Codalab link. All information about attending virtual lectures, tutorials, and office hours will be sent to enrolled students through Quercus. The final report is due April 7. PDF Understanding Black-box Predictions via Influence Functions We have two ways of measuring influence: Our first option is to delete the instance from the training data, retrain the model on the reduced training dataset and observe the difference in the model parameters or predictions (either individually or over the complete dataset). calculates the grad_z values for all images first and saves them to disk. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks. A. GitHub - kohpangwei/influence-release This is the case because grad_z has to be calculated twice, once for 10 0 obj ICML 2017 best paperStanfordPang Wei KohCourseraStanfordNIPS 2019influence functionPercy Liang11Michael Jordan, , \hat{\theta}_{\epsilon, z} \stackrel{\text { def }}{=} \arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L(z, \theta), \left.\mathcal{I}_{\text {up, params }}(z) \stackrel{\text { def }}{=} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0}=-H_{\tilde{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}), , loss, \begin{aligned} \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) &\left.\stackrel{\text { def }}{=} \frac{d L\left(z_{\text {test }}, \hat{\theta}_{\epsilon, z}\right)}{d \epsilon}\right|_{\epsilon=0} \\ &=\left.\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, \varepsilon=-1/n , z=(x,y) \\ z_{\delta} \stackrel{\text { def }}{=}(x+\delta, y), \hat{\theta}_{\epsilon, z_{\delta},-z} \stackrel{\text { def }}{=}\arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L\left(z_{\delta}, \theta\right)-\epsilon L(z, \theta), \begin{aligned}\left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} &=\mathcal{I}_{\text {up params }}\left(z_{\delta}\right)-\mathcal{I}_{\text {up, params }}(z) \\ &=-H_{\hat{\theta}}^{-1}\left(\nabla_{\theta} L(z_{\delta}, \hat{\theta})-\nabla_{\theta} L(z, \hat{\theta})\right) \end{aligned}, \varepsilon \delta \deltaloss, \left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} \approx-H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \hat{\theta}_{z_{i},-z}-\hat{\theta} \approx-\frac{1}{n} H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \begin{aligned} \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top} &\left.\stackrel{\text { def }}{=} \nabla_{\delta} L\left(z_{\text {test }}, \hat{\theta}_{z_{\delta},-z}\right)^{\top}\right|_{\delta=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, train lossH \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) , -y_{\text {test }} y \cdot \sigma\left(-y_{\text {test }} \theta^{\top} x_{\text {test }}\right) \cdot \sigma\left(-y \theta^{\top} x\right) \cdot x_{\text {test }}^{\top} H_{\hat{\theta}}^{-1} x, influence functiondebug training datatraining point \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) losstraining pointtraining point, Stochastic estimationHHHTFO(np)np, ImageNetdogfish900Inception v3SVM with RBF kernel, poisoning attackinfluence function59157%77%10590/591, attackRelated worktraining set attackadversarial example, influence functionbad case debug, labelinfluence function, \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right) , 10%labelinfluence functiontrain lossrandom, \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right), \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right), \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top}, H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}), Less Is Better: Unweighted Data Subsampling via Influence Function, influence functionleave-one-out retraining, 0.86H, SVMhinge loss0.95, straightforwardbest paper, influence functionloss. Deep learning via hessian-free optimization. Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017. the original paper linked here.