### Topology-agnostic GCN

Due to the possible attacks, watermarked meshes cannot simply be treated as template-based meshes. Even original meshes can also be non-template-based in the actual scenario. To represent these meshes, we employ isotropic filters to compose our convolution operation, with a fixed \(w_j\) in Eq. 1 for each neighboring vertex:

$$\begin{aligned} f_{i}^{l+1}=\phi \left( w_{0}f_{i}^{l}+\sum \limits _{j\in {\mathcal {N}}(i)} w_{1} f_{j}^{l}\right) . \end{aligned}$$

(2)

During training, we find our network converges slowly. We analyze this phenomenon for two reasons: randomly generated watermark bits in each iteration step and different connectivity for each vertex. To speed up training and ensure the convergence, we apply the degree normalization in GCN and design the GraphConv+BatchNorm+ReLU block as the main component of our network. We first define our GraphConv operation:

$$\begin{aligned} f_{i}^{l+1}=w_{0}f_{i}^{l}+w_{1} \sum \limits _{j\in {\mathcal {N}}(i)} \frac{1}{|{\mathcal {N}}(i)|} f_{j}^{l}, \end{aligned}$$

(3)

where \(|\cdot |\) denotes the cardinal number, indicating the vertex degree. Different from previous GCNs in generative tasks, the topology for each 3D mesh is agnostic. For each mesh with its own topology, topology-agnostic GCN needs to search the neighboring vertices for every vertex. For every mini-batch data, we employ the batch normalization operation to normalize the feature from the output of GraphConv. Then we define the graph residual block consisting of two GraphConv+BatchNorm+ReLU blocks with a short connection (He et al. 2016), as shown in Fig. 2. For the initial block of the embedding sub-network and extracting sub-network, the input feature is the 3D coordinates of vertices and outputs 64-dim feature. For other blocks, the output feature has the same shape as the input feature with 64 dimensions.

As shown in Fig. 3, our network includes a watermark embedding sub-network, attack layers and a watermark extracting sub-network. In the network, we define a 3D mesh as \({\mathcal {M}}=({\mathcal {V}}, {\mathcal {F}})\), where \({\mathcal {V}}\) denotes vertices and \({\mathcal {F}}\) denotes faces. And we use \(N_{in}\) to denote the number of input vertices. For each vertex \(i\in {{\mathcal {V}}}\), we use \({\mathbf {v}}_{i}=[x_{i},y_{i},z_{i}]^{\mathrm T}\in {\mathbb {R}}^3\) to denote the 3D coordinates in the Euclidean space. And we define watermark length as *C* bits.

### Watermark embedding sub-network

In this sub-network, we take original mesh \({\mathcal {M}}_{in}=({\mathcal {V}}_{in},{\mathcal {F}}_{in})\) and watermark \({\mathbf {w}}_{in}\) as the input. We employ five cascaded graph residual blocks to form the feature learning module \({\mathbf {F}}\). We first employ this module to learn the feature map \(F_{in}\) from input vertices \({\mathcal {V}}_{in}\). The watermark encoder \({\mathbf {E}}\) is responsible for encoding the input watermark into a latent code \({\mathbf {z}}_{w}\) by a fully connected layer. Then the latent code \({\mathbf {z}}_{w}\) is expanded along the number of vertices to align the vertices. After expanding, the latent code is concatenated with input vertices \({\mathcal {V}}_{in}\) and the mesh feature \(F_{in}\), and then fed into the aggregation module \({\mathbf {A}}\). In the last block of \({\mathbf {A}}\), there is a branch that applies an extra GraphConv layer and outputs the 3D coordinates of watermarked vertices \({\mathcal {V}}_{wm}\). The aggregation module \({\mathbf {A}}\) includes two graph residual blocks and outputs the 3D coordinates of mesh vertices. According to the original mesh \({\mathcal {M}}_{in}\) and watermarked vertices \({\mathcal {V}}_{wm}\), the watermarked 3D mesh \({\mathcal {M}}_{wm}\) can be constructed. Note that the symmetric function *Expanding* is used to align the vertices and the watermark feature, making the embedding process invariant to the reordering of input vertices, which may be very practical in the actual scenario.

### Attack layers

To guarantee the adaptive robustness to specific attacks, we train our network with attacked meshes. In this paper, we mainly consider representative attacks (including cropping, Gaussian noise, rotation and smoothing) and integrate them into attack layers. Note that we can integrate different attacks as the attack layers, according to the actual requirements.

#### Rotation

We rotate the 3D mesh in three dimensions with the rotation angle randomly sampled in every dimension. We use \(\theta\) to denote the rotation scope and the rotation angle in each dimension is randomly sampled: \(\theta _x,\theta _y,\theta _z\sim \textit{U}[-\theta ,\theta ]\). Then we rotate \({\mathcal {V}}_{wm}\) with the corresponding angle for every dimension in the Euclidean space.

#### Gaussian noise

We employ a zero-mean Gaussian noise model, sampling the standard deviation \(\sigma _{g} \sim \textit{U}[0,\sigma ]\) to generate random noise to 3D meshes. We generate \(\textit{noise} \sim {\mathcal {N}}(0,{\sigma _{g}} ^ {2})\) and attach it on the 3D coordinates of watermarked vertices.

#### Smoothing

Laplacian smoothing model (Taubin 2000) is employed to simulate the possible smoothing operation. For the watermarked mesh \({\mathcal {M}}_{wm}=({\mathcal {V}}_{wm},{\mathcal {F}}_{wm})\), we first calculate the Laplacian matrix \({\mathbf {L}} \in {\mathbb {R}}^{N_{in} \times N_{in}}\), and use \(\alpha _{s} \sim \textit{U} [0,\alpha ]\) to control the level of Laplacian smoothing. For the coordinate matrix \({\mathbf {V}}_{wm} \in {\mathbb {R}}^{N\times 3}\) of watermarked vertices \({\mathcal {V}}_{wm}\), we calculate the the coordinate matrix \({\mathbf {V}}_{att}\) of attacked vertices \({\mathcal {V}}_{att}\) as :

$$\begin{aligned} {\mathbf {V}}_{att}={\mathbf {V}}_{wm} - \alpha _{s} {\mathbf {L}} {\mathbf {V}}_{wm}. \end{aligned}$$

(4)

#### Cropping

We simulate this attack by cutting off a part of the mesh. We first normalize the vertices in a unit square and search for the two farthest points in the negative quadrant and the positive quadrant respectively. Then We connect two points and simulate using a knife cutting perpendicular to the line. So that we can cut off the part of the mesh, with \(\beta\) to control the minimum ratio of the reservation. \(\beta _{c}\sim \textit{U}[\beta ,1]\) is used to denote the actual ratio of the reservation at each cropping operation.

During training, we set the hyperparameters as follows: \(\theta =15^{\circ }, \sigma =0.03, \alpha =0.2, \beta =0.8\). Besides four attacks, we also integrate one identity layer which does not have any attack, to ensure the performance when no attack is suffered. During training, we randomly select one attack as the attack layer in each mini-batch. Then we can generate the attacked mesh \({\mathcal {M}}_{att}=({\mathcal {V}}_{att},{\mathcal {F}}_{att})\) after the watermarked mesh \({\mathcal {M}}_{wm}=({\mathcal {V}}_{wm},{\mathcal {F}}_{wm})\) passes through the attack layer. Figure 4 shows the original and attacked meshes under different attacks. With the differentiable attack layers, we can jointly train our embedding sub-network and extracting sub-network, and update the parameters simultaneously.

### Watermark extracting sub-network

We design a straightforward structure to extract the watermark. For the attacked vertices \({\mathcal {V}}_{att}\), we first employ the same feature learning module \({\mathbf {F}}\) to acquire the feature map \(F_{no}\). Followed by the global average pooling layer and a two-layer fully connected layer (MLP), the extracted watermark \({\mathbf {w}}_{ext}\) is obtained. The symmetric function *Global pooling* aggregates information from all vertices, which can also guarantee the variance under the vertices reordering attack.

### Loss function

To train the network, we define some loss functions. Mean square error (MSE) loss is first employed for constraining the watermark and mesh vertices:

$$\begin{aligned} l_{w}({\mathbf {w}}_{in},{\mathbf {w}}_{ext})= & {} \frac{1}{C}\vert \vert {\mathbf {w}}_{in}-{\mathbf {w}}_{ext}\vert \vert _2^2, \end{aligned}$$

(5)

$$\begin{aligned} l_{m}({\mathcal {M}}_{in},{\mathcal {M}}_{wm})= & {} \frac{1}{N_{in}}\sum _{i\in {{\mathcal {V}}_{in}}} \vert \vert { {{\mathbf {v}}_{i} - {\mathbf {v}}_{i'}}} \vert \vert _2^2, \end{aligned}$$

(6)

where \(i'\) denotes the paired vertex of vertex *i* in the watermarked mesh \({\mathcal {M}}_{wm}\).

\(l_m\) can constrain the spatial modification on mesh vertices as a whole. Yet the local geometry smoothness is also supposed to be guaranteed, as it greatly affects the visual perception of human eyes (Mariani et al. 2020). The local curvature can reflect the surface smoothness property (Torkhani et al. 2012). For 3D meshes, the local curvature should be defined based on the connection relations. As shown in Fig. 5, we use \(\theta _{ij}\in [0^{\circ },180^{\circ }]\) to represent the angle between the normalized normal vector \({\mathbf {n}}_i\) for vertex *i* and the direction of neighboring vertex *j*. We can find that the vertex’s neighboring angles represent the local geometry. For each vertex *i* in the mesh \({\mathcal {M}}\), we define the vertex curvature as:

$$\begin{aligned} cur(i,{\mathcal {M}}) =\sum _{j\in {\mathcal {N}}_i} \mathrm{cos } (\theta _{ij}), \end{aligned}$$

(7)

where

$$\begin{aligned} \mathrm{cos } (\theta _{ij}) = \frac{({\mathbf {v}}_{j} - {\mathbf {v}}_{i})^\mathrm{T}{\mathbf {n}}_i}{\vert \vert {\mathbf {v}}_j - {\mathbf {v}}_{i} \vert \vert _2}. \end{aligned}$$

(8)

To guarantee the local curvature consistency between original 3D mesh \({\mathcal {M}}_{in}\) and watermarked 3D mesh \({\mathcal {M}}_{wm}\), we define the curvature consistency loss function:

$$\begin{aligned} l_{cur}({\mathcal {M}}_{in},{\mathcal {M}}_{wm})=\frac{1}{N_{in}} \sum \limits _{i\in {{\mathcal {V}}_{in}}} \vert \vert (cur(i,{\mathcal {M}}_{in})-cur(i',{\mathcal {M}}_{wm}))\vert \vert _{2}^{2}. \end{aligned}$$

(9)

The combined objective is employed in the network: \({\mathcal {L}} = \lambda _1 l_w + \lambda _{2} l_{cur} + \lambda _{3} l_{m}\). By default, \(\lambda _1=\lambda _{2}=1\), and \(\lambda _{3} = 5\).