Terry Wu

Computer Vision, Deep Learning


  • Home

  • About

  • Tags

  • Categories

  • Archives

  • Search

LeetCode

Posted on 2019-04-05 | In Programming | 0 comments

Array

Read more »

YOLOv3 Code Analysis

Posted on 2019-03-04 | In Detection | 0 comments

代码基于 GluonCV 0.4.0 Yolov3.

Read more »

Pytorch

Posted on 2019-02-14 | In Deep Learning Frameworks | 0 comments

整理了一些使用 Pytorch 中遇到的常见问题和技巧, 持续更新中 ……

Read more »

Imbalance Learning

Posted on 2018-12-29 | In Long-tailed Recognition | 0 comments

数据不平衡处理方法的整理.

Read more »

Multi Person Pose Estimation

Posted on 2018-12-28 | In Pose Estimation , 2D Pose Estimation | 0 comments

人体姿态估计可以分为两类方法, 分别是:

  • Top-down: (two-step framework) 先使用人体检测算法检测人体, 然后逐人体估计关键点

缺点: 该方法严重依赖人体检测算法 bounding box 的准确性.

  • Bottom-up: (part-based framework) 首先检测人体部件(关键点), 然后集成这些人体关键点组成人体姿态.

缺点: 拥挤人群难以估计准确; 此外, 仅仅利用二级信息(关键点),失去从全局姿态中捕捉身体部件的能力.

Read more »

Summery of Anti-Spoofing

Posted on 2018-12-26 | In Face Recognition | 0 comments

人脸识别欺骗方式主要包含如下三种:

  • print attack
  • replay/video attack
  • 3D mask attack

Dataset

Dataset # subj. / # sess. Links Year Spoof attacks Publish Time
NUAA 15/3 Link 2010 Print 2010
CASIA-MFSD 50/3 Link 2012 Print, Replay 2012
Replay-Attack 50/1 Link 2012 Print, 2 Replay 2012
MSU-MFSD 35/1 Link 2015 Print, 2 Replay 2015
MSU-USSA 1140/1 Link 2016 2 Print, 6 Replay 2016
Oulu-NPU 55/3 Link 2017 2 Print, 6 Replay 2017
SiW 165/4 Link 2018 2 Print, 4 Replay 2018
ROSE-Youtu 25/* Link 2018 Print, Replay, Mask 2018

Details:

  • NUAA: 包含 12641 静态图片;
  • CASIA: 包含 50 个对象共 600 视频, 覆盖三种攻击方式 (photo, cut photo, video). 对于每个对象, 真实人脸和三种攻击方式捕获的人脸包含了三种不同图像质量的人脸 (低, 正常, 高);
  • Replay-Attack Dataset: 包含 50 个对象共 1300 视频. 对于每个对象, 有两种拍摄背景 (control and adverse), 三种攻击方式 (print, digital photo and video), 两种攻击形式 (fixed and hand-holding).
  • MSU_USSA: http://biometrics.cse.msu.edu/Publications/Databases/MSU_USSA/
  • Oulu-NPU: 包含 4950 个真实和攻击视频. 采集设备包含 (Samsung Galaxy S6 edge, HTC Desire EYE, MEIZU X5, ASUS Zenfone Selfie, Sony XPERIA C5 Ultra Dual and OPPO N3).
  • SiW: 共 165 个对象的视频. 每个对象包含 8 个活体和 20 个欺骗视频. 一共 4478 个视频. 每个视频 30 fps, 持续 15 秒, 1080P HD. 活体视频包含了不同的距离, 姿态, 光照和表情. 欺骗视频包含了打印纸张和翻拍.
  • ROSE-Youtu: 包含 25 个对象, 共 4225 个视频 (3350 videos with 20 subjects publically available with 5.45GB in size). 每个对象包含 150~200 个视频片段, 每个片段约 10 秒. 数据来源: Hasee, HUAWEI, ipad 4, iphone 5s, ZTE. 人脸到相机距离 30~50 cm.
Read more »

Deformable ConNet V1 & V2

Posted on 2018-11-30 | In Network Structures | 0 comments
Type Formula
Regular Convolution $y(p) = \sum_{k=1}^K w_k * x(p + p_k)$
Deformable ConvNet v1 $y(p) = \sum_{k=1}^K w_k * x(p + p_k + \Delta p_k)$
Deformable ConvNet v2 $y(p) = \sum_{k=1}^K w_k x(p + p_k + \Delta p_k) \Delta m_k$

Tricks in MxNet & Gluon

Posted on 2018-10-23 | In Deep Learning Frameworks | 0 comments

Gluon

报错:

1
RuntimeError: Parameter 'stn_conv0_weight' was not initialized on context gpu(0). It was only initialized on [gpu(0)].

解决方法:

需放在 ```gluon.Trainer(net.collect_params(), opt.optimizer)``` 的前面.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16


## 提取特征层输出

### Method 1: 继承 Block, 实现 forward

... 待续

### Method 2: 使用 SymbolBlock

```python
net = gluon.model_zoo.vision.densenet(pretrained=True, ctx=ctx)
internals = net.load_params("./densenet.params", ctx=context)
out_list = [internals['densenet0_stage3_conv13_fwd_output'],
'densenet0_stage3_conv13_fwd_output']]
net = gluon.SymbolBlock(out_list, data, params=net.collect_params())

网络冻结

  • 方式一: freeze 层在在 record 外面, 只 forward, 梯度不进行回传

不同层设置不同的学习率

1
2
3
net = gluon.model_zoo.vision.densenet(pretrained=True, ctx=ctx)
for name, params in net.features.collect_params().items():
params.lr_mult = 0.1

所设定层的学习率变为 base_learning_rate * params.lr_mult

梯度截断

1
gluon.Trainer(net.collect_params(), 'sgd', {'lr': 1e-2, 'grad_clip': 2})

MXNet

API

reshape

1
2
3
4
5
6
7
8
9
mxnet.ndarray.reshape(data=None, shape=_Null, reverse=_Null, target_shape=_Null, keep_highest=_Null, out=None, name=None, **kwargs)
"""
Some dimensions of the shape can take special values from the set {0, -1, -2, -3, -4}. The significance of each is explained below:
0 copy this dimension from the input to the output shape.
-1 infers the dimension of the output shape by using the remainder of the input dimensions keeping the size of the new array same as that of the input array. At most one dimension of shape can be -1.
-2 copy all/remainder of the input dimensions to the output shape.
-3 use the product of two consecutive dimensions of the input shape as the output dimension.
-4 split one dimension of the input into two dimensions passed subsequent to -4 in shape (can contain -1)
"""

Debug

1
2
3
4
5
错误提示:
ValueError: You created Module with Module(..., data_names=['data']) but input with name 'data' is not found in symbol.list_arguments(). Did you mean one of: data0

解决方案:
mod = mx.mod.Module(symbol=sym, context=ctx, data_names=('data0',), label_names=None)
1
2
3
4
5
6
错误提示:  
DeferredInitializationError: Parameter disnet0_conv5_weight has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

解决方案:

在net的class里的init部分定义了层,在forward没有使用。删掉就好了

Pedstrain Attribute Notes

Posted on 2018-09-13 | In Attribute Recognition | 0 comments

行人属性识别通常包含了多个属性的识别,如 gender, age, coat, trousers, luggage 等。在实际项目中我们发现,相比于人脸属性,行人属性识别具有其自身特点,难度更高:

  • 部分属性仅和人体的部分区域相关,如上衣长短袖仅和人体上半身相关;
  • 训练样本较难,主要表现为部分身体和遮挡 (如某些人体只有上半身或下半身,同一人体 bbox 中出现多个人体);
  • 错标、歧义标签占比大 (实际场景中 > 30%);
  • 某些样本标签丢失;
  • 样本的类别分布不平衡;
  • 某些属性是多分类,某些属性是二分类,不能简单地将多分类问题转化为二分类问题的;
  • 某些属性占整张图片的区域过小,如墨镜,正脸情况下的双肩包预测;

归结起来,可以用如下几个关键词描述行人属性识别中遇到的挑战:

  • Occlusion
  • Trunction
  • Imbalance
  • Noisy Label
  • Label Missing

在算法设计中比较简单粗暴的思路是每个属性采用独立模型,可以想见当属性很多时,即使是小模型, 高性能 GPU 都未必能够处理过来. 所以更具有可扩展性的做法是当成多任务 (multi-task) 或者多标签 (multi-label)学习. 特别地,当某些属性包含多个类 ($\geq3$) 时, 采用 multi-task learning 更合理。

基于多任务学习的属性识别面临诸多挑战,比如哪一层开始作为 share layer 开始分支, 不同任务之间 loss 的权重问题, 总结起来所面临的挑战包含如下几个方面:

  • Networks Structure
  • Adaptive Loss
  • Class Imbalance
  • Missing Labels

针对如上说列举的一些挑战, 也有相关文献提出了相应的解决方案. 但到目前为止, 还未看到哪一篇文献对以上问题进行了综合解决, 也许这只是一个工程问题, 大佬们都不屑于解决这样琐碎的问题 :yum::yum::yum:. 但是在产品落地的过程必须要解决这些问题.


Read more »

CV&DL Awesome

Posted on 2018-09-13 | 0 comments

This is an arranged list of repositories about machine learning, computer vision et. al.

Read more »
123

Terry Wu

23 posts
18 categories
15 tags
© 2019 Terry Wu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4