Current research methodologies in CAD model development frequently diverge from established engineering design practices. In many instances, these methodologies fail to accurately reflect the workflows of practicing engineers and do not align with the sequential stages of the design process before a final output is achieved. This disconnect often leads to the exclusion of critical design activities, such as system analysis, application of design rules, and adherence to design criteria, thereby diminishing the practical relevance and translational value of the research. Moreover, much of the existing work tends to focus solely on the final engineering deliverables, without adequately addressing the intermediate processes and decision-making steps that are essential to rigorous design practice.
Despite the advances noted above, current deep learning approaches exhibit fundamental limitations when it comes to truly understanding engineering drawings. This section examines the semantic gap – the disconnect between what neural networks detect in a drawing and the actual engineering meaning – focusing on CNNs and GNNs:
Limitations of CNN-Based Approaches: CNNs process drawings as pixel grids, excelling at pattern recognition but lacking an inherent notion of objects or relationships. As a result, CNNs tend to recognize drawing elements in isolation rather than understanding their global context. Key shortcomings include:
Fragmented Perception: A CNN might detect lines, curves, or text regions based on local features, but it does not inherently know that, say, two parallel lines with arrowheads and a number in between form a dimension annotation. The model sees parts of a drawing but not the “whole picture” of how those parts relate semantically. This local focus makes it difficult to capture the global constraints of a drawing (e.g. that multiple views must be consistent).
Positional Imprecision: Standard CNN architectures introduce invariances (through pooling layers) that are useful for natural images but problematic for CAD drawings where precision is paramount. Engineering drawings require precise spatial relationships – a slight shift in position can change meaning – yet CNNs may treat two slightly different placements as the same due to their tolerance for translation. Important semantic details like alignment of text with a line or exact geometric proportions can be lost.
No Built-in Knowledge of Rules: Perhaps most critically, CNNs have no intrinsic knowledge of engineering conventions unless they somehow infer statistical correlations from vast amounts of data. They do not understand that a leader arrow pointing to a circle means a diameter specification, or that certain symbols imply specific real-world components. Design standards (such as “a section view cutting-plane line must be drawn as a long dash followed by two short dashes”) are effectively arbitrary patterns from a CNN’s point of view – the network will only mimic such rules if explicitly trained on many examples covering all variations. If the training data is limited (as is often the case for niche engineering symbols), CNNs generalize poorly [2] cdn.aaai.org. In practice, datasets of labeled engineering drawings are scarce [1] link.springer.com, so CNN-based methods often face data scarcity and class imbalance issues, leading to brittle performance. For example, one study found that CNN symbol detectors ranged from near-perfect accuracy to almost zero, depending on the training data distribution [4] mdpi.com. This volatility underscores how purely statistical pattern learners struggle with the diversity and precision of real engineering drawings. In summary, CNNs provide powerful visual feature extraction, but by themselves they miss the semantic relationships and logical constraints that engineers associate with those features.
Strengths and Limits of GNN-Based Methods: GNNs were introduced to address some of CNNs’ weaknesses by operating on graph representations rather than raw pixel grids. In the context of engineering drawings, a GNN can take as input a graph where nodes represent entities (e.g. line segments, text blocks, symbols) and edges represent relationships (e.g. connectivity or proximity). This imbues the model with a sense of the drawing’s topology – for instance, a GNN could learn that a certain text node is connected to a line node, indicating a label attached to a line.
The strength of GNNs lies in structural modeling: they can capture relationships like adjacency, connectivity, or grouping naturally, which is difficult for a CNN to learn implicitly [3] colab.ws. In tasks such as classifying a drawing or extracting a subgraph (e.g., isolating the part outline vs. dimension lines), GNNs can outperform CNNs by using relational cues.
However, GNNs alone still have important limitations:
Dependency on Correct Graph Extraction: A GNN is only as good as the graph it operates on. Converting a raw drawing into a meaningful graph of entities often requires its own set of algorithms or neural detectors. Errors or omissions in this conversion (e.g., missing an edge between two lines that actually connect) can mislead the GNN.
Lack of Symbolic Reasoning and Rule Enforcement: While GNNs encode relationships, they do so in a numeric, learned manner – they propagate messages between nodes but do not apply explicit logical rules derived from standards. A GNN does not inherently know engineering rules either; it might learn common patterns like “arrowhead nodes often attach to dimension line nodes,” but it cannot guarantee rule compliance or understand the meaning behind the rule. It may still output a graph configuration that violates a known drafting standard if that configuration is statistically common in the training data or yields a higher score based on learned correlations. In other words, GNNs do not perform explicit symbolic reasoning or enforce hard constraints based on codified knowledge; they operate in the realm of probabilities learned from data.
Scalability and Complexity: As drawings grow in complexity, the graphs can become very large (hundreds or thousands of nodes and relations), pushing GNNs to their limits in terms of computational complexity and risk of over-smoothing (losing feature uniqueness across the graph). Also, training GNNs requires a significant amount of graph-annotated data, which is just as scarce as pixel-annotated data in engineering domains. In practice, current GNN-based drawing analyses still incorporate heuristic or rule-based steps. For example, Xie et al. had to pre-process drawings to remove tables and dimension lines using algorithms before applying their final GNN [3] colab.ws. This highlights that pure GNN solutions often need some built-in knowledge or pre-processing informed by drawing conventions to work effectively.
Semantic Gap and Knowledge Integration Challenges: Both CNNs and GNNs, as primarily data-driven learners, fundamentally operate at the level of pattern recognition rather than true semantic understanding. The semantics of an engineering drawing – the intent behind a symbol, the functional role of a depicted feature, the compliance to drafting standards – are not directly captured by these models. Bridging this gap requires incorporating external knowledge (e.g., a library of symbols with their meanings, or a set of if-then rules about how dimensions are expressed). However, integrating such knowledge bases with deep learning is non-trivial. One challenge is symbol grounding: how to link abstract engineering concepts (like “center line” or “datum reference”) to the raw pixels or graph nodes that the network sees [5, 8] medium.com. Another challenge is that deep models output probabilistic predictions, whereas design rules are often absolute and deterministic – reconciling the two when they disagree is difficult. Without careful design, a naive combination might result in a system that is too rigid (over-constrained by rules) or one that ignores rules except as post-processing checks. Researchers have noted that truly grounding symbolic rules in the data-driven learned representations is an open problem [5] medium.com. In the case of engineering drawings, the semantics are grounded in real-world geometry and standards: for example, a particular geometric tolerance symbol implies a specific requirement on the manufactured part. Current deep models do not connect such dots. This inability to inherently handle design rules, semantic relationships, and perform symbolic reasoning is what most limits CNNs/GNNs – and motivates a hybrid approach.