Flying-ArUco v2 Dataset

This is a synthetic dataset composed of images containing various ArUco markers overlaid on backgrounds sampled from the MS COCO 2017 training dataset, used to train the models in DeepArUco++: improved detection of square fiducial markers in challenging lighting conditions. The contents of each file are the following:

  • flyingarucov2.tar.gz: Base FlyingArUco v2 images, without any augmentation applied. For each image a .json file is provided with the ground-truth positions of each marker and encoded IDs.
  • det_luma.tar.gz: Detection dataset built from the base FlyingArUco v2 dataset, using only (simulated) changes in lighting in order to obtain augmented samples.
  • det_luma_b.tar.gz: Detection dataset built from the base FlyingArUco v2 dataset, using (simulated) changes in lighting and blur in order to obtain augmented samples.
  • det_luma_bc.tar.gz: Detection dataset built from the base FlyingArUco v2 dataset, using (simulated) changes in lighting, blur and color shift in order to obtain augmented samples.
  • det_luma_bn.tar.gz: Detection dataset built from the base FlyingArUco v2 dataset, using (simulated) changes in lighting, blur and gaussian noise in order to obtain augmented samples.
  • det_luma_bnc.tar.gz: Detection dataset built from the base FlyingArUco v2 dataset, using (simulated) changes in lighting, blur, gaussian noise and color shift in order to obtain augmented samples.
  • reg_luma_bc.tar.gz: Corner refinement (regression)/marker decoding dataset built from the det_luma_bc detection dataset.

Examples

Some examples from the dataset are the following:


Dataset structure

The base FlyingArUco v2 (as provided in flyingarucov2.tar.gz) consists of 2500 images generated by overlaying ArUco markers and fake samples on backgrounds sourced from the MS COCO 2017 training set. For every image, there is a JSON file containing the ground-truth positions (2D coordinates) of the corners of each (real) marker in the image, as well as their ID and their rotation (i.e. how many times it has been rotated counterclockwise to reach its current orientation in the image). Please refer to the following example for clarification:

{
    "markers": [
        {
            "id": 29,
            "corners": [
                [
                    481.7849807739258,
                    261.83265686035156
                ],
                [
                    516.1624603271484,
                    203.50667572021484
                ],
                [
                    451.4819869995117,
                    169.94456100463867
                ],
                [
                    425.1454734802246,
                    229.83663177490234
                ]
            ],
            "rot": 2
        },
        ...

Detection datasets (such as det_luma.tar.gz) are provided in the format expected for the training of YOLOv8 (by Ultralytics). The directory structure is the following:

det_luma
├── train
│   ├── images
│   │   ├── 000000000089.jpg
│   │   ├── 000000000315.jpg
│   │   ├── 000000000349.jpg
            ...
│   └── labels
│       ├── 000000000089.txt
│       ├── 000000000315.txt
│       ├── 000000000349.txt
            ...
└── valid
    ├── images
    │   ├── 000000000486.jpg
    │   ├── 000000000612.jpg
    │   ├── 000000000913.jpg
            ...
    └── labels
        ├── 000000000486.txt
        ├── 000000000612.txt
        ├── 000000000913.txt

The images directory contains the augmented frames, while the labels directory stores the ground-truth marker locations in the following format: The first value (class_id) is always 0 since we are only considering a single class for this detection task. The following four values are normalized with respect to the image dimensions: x_center and marker_width are normalized by the image width, while y_center and marker_height are normalized by the image height. See the following example for clarification:

0 0.7353968232870102 0.599690580368042 0.1422140419483185 0.25524471071031357
0 0.4113114923238754 0.8842228227191501 0.031858795881271364 0.07755138079325358
0 0.4016574129462242 0.701622458299001 0.04431436359882355 0.08198304971059163
0 0.865299366414547 0.7644710514280532 0.03821879923343659 0.07337598270840115
0 0.2047746479511261 0.7245340400271946 0.12338865995407104 0.21934681998358832
0 0.32464619278907775 0.7738529602686565 0.03873543739318848 0.06867445309956868
0 0.740096740424633 0.21104383733537463 0.03647349178791046 0.05871715015835232
0 0.33322680816054345 0.13155997859107124 0.028741587698459626 0.051002592510647246

Finally, the regression datasets (such as reg_luma_bc.tar.gz) are provided as a collection of crops (64 x 64 images), each one containing a marker from an image of the matching detection dataset (e.g. reg_luma_bc.tar.gz and det_luma_bc.tar.gz). Labels for these crops are provided in CSV files, with separate files for the training and validation sets. The structure of these CSV files is the following: the first column contains the name of the crop, which corresponds to the source image name followed by a two-digit crop index. This is followed by the x and y coordinates for each of the four corners of the marker, the marker’s rotation, and its ID. See the following example for clarification:

pic,c1_x,c1_y,c2_x,c2_y,c3_x,c3_y,c4_x,c4_y,rot,id
000000492129_00.jpg,0.40259319265856375,0.15452799356724128,0.1545299431229553,0.5946699372972077,0.5912928629425661,0.8454720064327587,0.8454700568770447,0.40691390662961124,3,43
000000492129_01.jpg,0.15401030670038238,0.4037121428420323,0.4583366336930896,0.845905562858047,0.8459896932996176,0.560064082549943,0.559349025823742,0.1540944371419531,2,65
000000492129_02.jpg,0.15134224899058682,0.46932990290960436,0.4602298844365988,0.8486538262764869,0.8486577510094132,0.5485276412548922,0.5280271849356097,0.15134617372351275,3,1

Code

To support the research community and encourage exploration, we have provided access to the code used for the creation of the dataset through our GitHub repository.

Download our dataset

The pre-computed version of this dataset used in our work can be downloaded through Zenodo DOI

Read our work

A preprint of our paper can be accessed through ArXiv: link

Citing

If you use this work in your research, you must cite:

  1. Rafael Berral-Soler, Rafael Muñoz-Salinas, Rafael Medina-Carnicer, Manuel J. Marín-Jiménez, DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions, Image and Vision Computing, Volume 152, 2024, 105313, ISSN 0262-8856, https://doi.org/10.1016/j.imavis.2024.105313

Contact

If you have any further questions, please contact rberral@uco.es.