Skip to main content

Table 1 The architecture of the recognition branch

From: An end-to-end text spotter with text relation networks

Part Type Parameters(kernel size, stride, padding) Out channels
Encoder conv_gn_relu × 5 [3, 1, 1] 256
Encoder max-pool × 1 [2, 2, 0] 256
Encoder conv_gn_relu × 1 [3, 1, 1] 256
TRN SAGL + GFPN [3, 1, 1] 256
Decoder GRU with Attention [3, 1, 1] 256
Decoder fully-connected - Nchar