Multi Evidence Filtering and Fusion for Multi Label

Multi Evidence Filtering And Fusion For Multi Label-Free PDF

  • Date:29 Jun 2020
  • Views:5
  • Downloads:0
  • Pages:10
  • Size:2.18 MB

Share Pdf : Multi Evidence Filtering And Fusion For Multi Label

Download and Preview : Multi Evidence Filtering And Fusion For Multi Label

Report CopyRight/DMCA Form For : Multi Evidence Filtering And Fusion For Multi Label


obtain object localization and pixelwise semantic labeling with an object detection heat map The fused maps are used. results for the training images first using their image level for training a fully convolutional network for pixel labeling. labels and then use such intermediate results to train ob. ject detection semantic segmentation and multi label im 2 Related Work. age classification networks in a fully supervised manner. Since image level object level and pixel level analy Weakly Supervised Object Detection and Segmentation. sis has mutual dependencies they are not performed in Weakly supervised object detection and segmentation re. dependently but organized into a single pipeline with four spectively locates and segments objects with image level la. stages In the first stage we collect object localization re bels only 28 7 They are important for two reasons first. sults in the training images from both bottom up and top learning complex visual concepts from image level labels is. down weakly supervised object detection algorithms In one of the key components in image understanding second. the second stage we incorporate both metric learning and fully supervised deep learning is too data hungry. density based clustering to filter detected object instances Methods in 28 10 9 treat the weakly supervised lo. In this way we obtain a relatively clean and complete set of calization problem as an image classification problem and. object instances Given these object instances we further obtain object locations in specific pooling layers of their. train a single label object classifier which is applied to all networks Methods in 4 38 extract object instances from. object instances to obtain their final class labels Third to images using selective search 40 or edge boxes 48 con. obtain a relatively clean pixel wise probability map for ev vert the weakly supervised detection problem into a multi. ery class and every training image we fuse the image level instance learning problem 8 The method in 8 at first. attention map object level attention maps and an object de learns object masks as in 10 9 and then uses the E M al. tection heat map The pixel wise probability maps are used gorithm to force the network to learn object segmentation. for training a fully convolutional network which is applied masks obtained at previous stages Since it is very hard for. to all training images to obtain their final pixel wise label a network to directly learn object locations and pixel labels. maps Finally the obtained object instances and pixel wise without sufficient supervision in this paper we decompose. label maps for all the training images are used for training object detection and pixel labeling into multiple easier prob. deep networks for object detection and semantic segmen lems and solve them progressively in multiple stages. tation respectively To make pixel wise label maps of the Neural Attention Many efforts 44 2 23 have been made. training images help multi label image classification we to explain how neural networks work The method in 23. perform multi task learning by training a single deep net extends layer wise relevance propagation LRP 1 to com. work with two branches one for multi label image classifi prehend inherent structured reasoning of deep neural net. cation and the other for pixel labeling Experiments show works To further ignore the cluttered background a pos. that our weakly supervised curriculum learning system is itive neural attention back propagation scheme called ex. capable of achieving state of the art results in multi label citation back propagation Excitation BP is introduced in. image classification as well as weakly supervised object de 44 The method in 2 locates top activations in each con. tection and very competitive results in weakly supervised volutional map and maps these top activation areas into the. semantic segmentation on MS COCO 26 PASCAL VOC input image using bilinear interpolation. 2007 and PASCAL VOC 2012 12 In our pipeline we adopt the excitation BP 44 to cal. In summary this paper has the following contributions culate pixel wise class probabilities However for images. with multiple category labels a deep neural network could. We introduce a novel weakly supervised pipeline for fuse the activations of different categories in the same neu. multi label object recognition detection and semantic seg rons To solve this problem we train a single label object. mentation In this pipeline we first obtain intermediate instance classification network and perform excitation BP in. labeling results for the training images and then use such this network to obtain more accurate pixel level class prob. results to train task specific networks in a fully supervised abilities. manner Curriculum Learning Curriculum learning 3 is part of. To localize object instances relatively accurately in the the broad family of machine learning methods that starts. training images we propose a novel algorithm for filter with easier subtasks and gradually increases the difficulty. ing fusing and classifying object instances collected from level of the tasks In 3 Yoshua et al describe the concept. multiple solution mechanisms In this algorithm we incor of curriculum learning and use a toy classification problem. porate both metric learning and density based clustering to to show the advantage of decomposing a complex problem. filter detected object instances into several easier ones In fact the idea behind curricu. To obtain a relatively clean pixel wise probability map for lum learning has been widely used before 3 Hinton et. every class and every training image we propose an algo al 17 trained a deep neural network layer by layer using. rithm for fusing image level and object level attention maps a restricted Boltzmann machine 36 to avoid the local min. a Image Level Stage Proposal Generation b Instance Level Stage Outlier Detection and c Pixel Level Stage Probability Map Fusion. and Multi Evidence Fusion Object Instance Filtering and Pixel Label Prediction. Input Image Object Heatmap Object Instances Triplet Loss Net Filtered Object Instances Label Map with. Uncertainty,Instance Attention Map Probability Map. Image Attention Map,Instance Classifier, Figure 1 The proposed weakly supervised pipeline From left to right a Image level stage fuse the object heatmaps H and the image. attention map Ag to generate object instances R for the instance level stage and provide these two maps for information fusion at the pixel. level stage b Instance level stage perform triplet loss based metric learning and density based clustering for outlier detection and train a. single label instance classifier s for instance filtering c Pixel level stage integrate the object heatmaps H instance attention map. Al and image attention map Ag for pixel labeling with uncertainty. ima in deep neural networks Many machine learning algo. rithms 37 14 follow a similar divide and conquer strategy. in curriculum learning, In this paper we adopt this strategy to decompose the. pixel labeling problem into image level learning object in. stance level learning and pixel level learning All the learn. ing tasks in these three stages are relatively simple using. the training data in the current stage and the output from the. previous stage, a Heatmap Proposals b Attention Proposals c Fused Proposals. 3 Weakly Supervised Curriculum Learning Figure 2 a Proposals Rh and Rl generated from an object. heatmap b proposals generated from an attention map c fil. 3 1 Overview tered proposals green heatmap proposals red and blue and at. Given an image I associated with an image level label tention proposals purple. vector y I y 1 y 2 y C T our weakly supervised cur. riculum learning aims to obtain pixel wise labels Y I. y 1 y 2 y P T and then use these labels to assist weakly Object Heatmaps Unlike the fully supervised case. supervised object detection semantic segmentation and weakly supervised object detection produces object in. multi label image classification Here C is the total num stances with higher uncertainty and also misses a higher. ber of object classes P is the total number of pixels in I percentage of true objects To reduce the number of miss. and y l is binary y l 1 means the l th object class exists in ing detections we propose to compute an object heatmap. I and y l 0 otherwise The label of a pixel p is denoted H for every object class existing in the image. by a C dimensional binary vector y p The number of object For an image I with width W and height H a dense. classes existing in I which is the same as the number of set of object proposals R R1 R2 Rn are generated. positive components of y I is denoted by K Following the using sliding anchor windows And the feature stride s is. divide and conquer idea in curriculum learning 3 we de set to 8 The number of locations in the input image where. compose the pixel labeling task into three stages the image we can place anchor windows is H s W s Denote the. level stage the instance level stage and the pixel level stage short side of image I by L Following the setting used for. RPN 29 we let the anchor windows at a single location. 3 2 Image Level Stage have four scales L 8 L 4 L 2 L and three aspect. The image level stage not only decomposes multi label ratios 0 5 1 2 After proposals out of image borders have. image classification into a set of single label object instance been removed there are usually 12000 remaining proposals. classifications but also provides an initial set of pixel wise per image Here we define a stack of object heatmaps H. probability maps for the pixel level stage H 1 H 2 H C as a C H W matrix and all values are. set to zero initially The object detection and classification. network d used here is the weakly supervised object. testing net VGG 16 from 38 For every proposal Ri in. R its object class probability vector d I Ri is added to. all the pixels in the corresponding window in the heatmaps. Then every heatmap is normalized to 0 1 as follows. H c H c min H c max H c, where H c is the heatmap for the c th object class Note a Input Proposals b Distance Map.
that only the heatmaps for object classes existing in I are Figure 3 a Input proposals of the triplet loss network b dis. normalized All the other heatmaps are ignored and set to tance map computed using features from the triplet loss network. 3 3 Instance Level Stage, Multiple Evidence Fusion The object heatmaps high Since multiple object categories present in the same im. light the regions that may contain objects even when the age make it hard for neural attention to obtain an accurate. level of supervision is very weak However since they are pixel wise attention map for each class we train a single. generated using sliding anchor windows at multiple scales label object instance classification network and compute at. and aspect ratios they tend to highlight pixels near but out tention maps in this network to obtain more accurate pixel. side true objects as shown in Fig 2 Given an image clas level class probabilities The fused object instances from. sification network trained using the image level labels here the image level stage are further filtered by metric learning. we use GoogleNet V1 44 neural attention calculates the and density based clustering The remaining labeled object. contribution of every pixel to the final classification result proposals are used for training this object instance classi. It tends to focus on the most influential regions but not nec fier which can also be used to further remove remaining. essarily the entire objects Note that false positive regions false positive object instances. may occur during excitation BP 44 To obtain more ac. curate object instances we integrate the top down atten Metric Learning for Feature Embedding Metric learn. tion maps Ag A1g A2g ACg with the object heatmaps ing is popular in face recognition 34 person re. identification and object tracking 34 46 39 It embeds. H H 1 H 2 H C, an image X into a multi dimensional feature space by as. For object classes existing in image I their correspond sociating this image with a fixed size vector t X in. ing heatmaps H and attention maps Ag are thresholded by the feature space This embedding makes similar images. distinct values The heatmaps H are too smooth to indicate close to each other and dissimilar images apart in the fea. accurate object boundaries but they provide important spa ture space Thus the similarity between two images can. tial priors to constrain object instances obtained from the be measured by their distance in this space The triplet. attention maps We assume that regions with a sufficiently loss network t proposed in 34 has the additional. high value in the object heatmaps should at least include property that it can well separate classes even when intra. parts of objects and regions with sufficiently low values class distances have large variations When there exist. everywhere do not contain any objects Following this as training samples associated with incorrect class labels the. sumption we threshold the heatmaps with two values 0 65 loss stays at a high value and the distances between cor. and 0 1 to identify highly confident object proposals Rh rectly labeled and mislabeled samples remain very large. R1h R2h RN h, and relatively low confident object pro even after the training process has run for a long time. posals R R1 R2l RN, after connected component Now let R R1 R2 RO T denote the fused object. Multi Evidence Filtering and Fusion for Multi Label Classi cation Object Detection and Semantic Segmentation Based on Weakly Supervised Learning Weifeng Ge Sibei Yang Yizhou Yu Department of Computer Science The University of Hong Kong Abstract Supervised object detection and semantic segmentation require object or even pixel level annotations When there exist image level labels only it

Related Books