From: An end-to-end text spotter with text relation networks
Part | Type | Parameters(kernel size, stride, padding) | Out channels |
---|---|---|---|
Encoder | conv_gn_relu × 5 | [3, 1, 1] | 256 |
Encoder | max-pool × 1 | [2, 2, 0] | 256 |
Encoder | conv_gn_relu × 1 | [3, 1, 1] | 256 |
TRN | SAGL + GFPN | [3, 1, 1] | 256 |
Decoder | GRU with Attention | [3, 1, 1] | 256 |
Decoder | fully-connected | - | Nchar |