Tech

GitHub user Hawzen applies PCA to analyse Jurassic-age fossil in Saudi Arabia

Data-driven approach identifies closest visual match as Sphincterochila candidissima, despite significant chronological mismatch with the region’s late Jurassic history.

Author
Owen Mercer
Markets and Finance Editor
Published
Draft
Source: Hacker News · original
Tech
No image available
Morphological analysis of Alghat desert discovery suggests convergent evolution rather than direct lineage

GitHub user Hawzen has published an open-source repository detailing the morphological analysis of a solid rock resembling a seashell discovered in the Alghat desert, Saudi Arabia. The find, located at the base of a cliff approximately 500 kilometres from the nearest coastline at Dammam, prompted a data-driven investigation into the object’s origins. The author utilised Principal Component Analysis (PCA) to compare the fossil’s shape against a dataset of 78,940 species, aiming to identify potential biological relatives despite the geological context suggesting the region was submerged during the late Jurassic period, around 150 million years ago.

The analysis began by extracting the contour of the fossil to 256 points relative to its centre, creating a high-dimensional representation of its shape. To manage this complexity, the author applied PCA to reduce the dimensionality of the data, retaining 67.25 per cent of the variance using only the first two principal components. This statistical technique allowed the author to map the fossil into a latent space where PC1 captured the 'pointiness' of the shell and PC2 captured its symmetry. The methodology relied on the Zhang et al. shell dataset, which comprised 78,940 species and 59,244 images, to establish a baseline for comparison.

The results of the morphological analysis identified Sphincterochila candidissima as the closest visual match to the Alghat fossil. However, this identification presents a significant chronological discrepancy. The earliest fossil records for Sphincterochila candidissima date to approximately 38 million years ago, which is substantially younger than the late Jurassic period when the Arabian Peninsula was submerged under the sea. The author noted that while the shape similarity is striking, the temporal gap makes a direct lineage unlikely.

Despite the chronological mismatch, the author suggested that the similarity may indicate convergent evolution, a biological phenomenon where different species evolve similar shapes due to similar environmental pressures. The repository acknowledges that morphology alone is likely insufficient to definitively determine lineage, as different species can appear similar without being closely related. The author emphasised that a proper identification would require detailed analysis of the surrounding sediment and expert paleontological review, which was beyond their own expertise.

The publication of this analysis highlights the growing accessibility of statistical tools for amateur scientific inquiry. By making the code and methodology public, Hawzen invited others to explore the shell latent space and test other specimens. The project underscores the intersection of open-source software development and scientific curiosity, even when the conclusions drawn from data-driven methods require further validation by domain experts.

Continue reading

More from Tech

Read next: Apple to roll out manual EQ controls for AirPods in iOS 27 update
Read next: Apple rolls out visionOS 27, integrating AI-driven Siri into Vision Pro headset
Read next: Apple Overhauls Siri with Google Gemini Partnership and Standalone App at WWDC 2026