<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.2" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">Sensory Systems</journal-id><journal-title-group><journal-title>Sensory Systems</journal-title></journal-title-group><issn publication-format="print">0235-0092</issn><issn publication-format="electronic">3034-5936</issn><publisher><publisher-name>Russian Academy of Science</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.7868/S3034593625010078</article-id><title-group><article-title>Three-dimensional object detection based on an L-shape model in autonomous motion systems</article-title><trans-title-group xml:lang="ru"><trans-title>Трехмерная детекция объектов на основе L-shape модели в автономных системах движения</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><contrib-id contrib-id-type="orcid"></contrib-id><name-alternatives><name xml:lang="en"><surname>Chekanov</surname><given-names>M. O.</given-names></name><name xml:lang="ru"><surname>Чеканов</surname><given-names>М. О. </given-names></name></name-alternatives><email>mikhail.chekanov@evocargo.com</email><xref ref-type="aff" rid="aff-1"></xref><xref ref-type="aff" rid="aff-2"></xref></contrib></contrib-group><aff-alternatives id="aff-1"><aff><institution xml:lang="ru">Институт проблем передачи информации им. А.А. Харкевича Российской академии наук</institution><institution xml:lang="en">Kharkevich Institute for Information Transmission Problems of the Russian Academy of Sciences</institution></aff></aff-alternatives><aff-alternatives id="aff-2"><aff><institution xml:lang="ru"></institution><institution xml:lang="en"></institution></aff></aff-alternatives><pub-date date-type="pub" iso-8601-date="2025-01-01" publication-format="electronic"><day>01</day><month>01</month><year>2025</year></pub-date><fpage>66</fpage><lpage>78</lpage><abstract xml:lang="en"><p>The ability of automated vehicles (AVs) to determine the position of objects in three-dimensional space plays a key role in motion planning. The implementation of algorithms that solve this problem is particularly difficult for systems that use only monocular cameras, as depth estimation is a non-trivial task for them. Nevertheless, such systems are widespread due to their relative cheapness and ease of use. In this paper, we propose a method to determine the position of vehicles (the most common type of objects in urban scenes) in the form of oriented bounding boxes in birds’-eye view based on an image obtained from a single monocular camera. This method consists of two steps. In the first step, a projection of the visible boundary of the vehicle in the birds’-eye view is computed based on 2D obstacle detections and roadway segmentation in the image. The resulting projection is assumed to represent the noisy measurements of the two orthogonal sides of the vehicle. In the second step, an oriented bounding box is constructed around the obtained projection. For this stage, we propose a new algorithm for constructing the bounding box based on the assumption of the L-shape model. The algorithm was tested on a prepared real-world dataset. The proposed L-shape algorithm outperformed the best of the compared algorithms in terms of the Jaccard coefficient (Intersection over Union, IoU) by 2.7%.</p></abstract><trans-abstract xml:lang="ru"><p>Способность высокоавтоматизированных транспортных средств (ВАТС) определять положение объектов в трехмерном пространстве играет ключевую роль в планировании движения. Реализация алгоритмов, решающих данную проблему, особенно сложна для систем, использующих исключительно монокулярные камеры, т.к. для них оценка глубины представляет нетривиальную задачу. Тем не менее такие системы широко распространены ввиду относительной дешевизны и простоты эксплуатации. В данной статье мы предлагаем метод к определению положения транспортных средств (наиболее распространенного типа объектов окружения в городских условиях) в виде произвольно ориентированных ограничивающих рамок на виде сверху (birds’-eye view) по изображению, полученному с одной монокулярной камеры. Этот метод состоит из двух этапов. На первом этапе вычисляется проекция видимой границы транспортного средства в виде сверху на основе 2D-детекций препятствий и сегментации проезжей части на изображении. Предполагается, что полученная проекция представляет зашумленные измерения двух ортогональных сторон ТС. На втором этапе строится ориентированная ограничивающая рамка вокруг полученной проекции. Для этого этапа мы предлагаем новый алгоритм построения рамки на основе предположения об L-образности проекции: L-shape алгоритм. Тестирование алгоритма проводилось на самостоятельно подготовленном наборе реальных данных. Предлагаемый L-shape алгоритм превзошел лучший из сравниваемых алгоритмов по коэффициенту Жаккара (Intersection over Union, IoU) на 2.7%.</p></trans-abstract><kwd-group xml:lang="en"><kwd>трехмерная детекция L-shape монокулярная детекция объектов автономное пилотирование</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>трехмерная детекция L-shape монокулярная детекция объектов автономное пилотирование</kwd></kwd-group></article-meta></front><body></body><back><ref-list><ref id="B1"><label>B1</label><citation-alternatives><mixed-citation xml:lang="ru">Шипитько О. С., Тетерюков Д. О. Разработка алгоритма оценки пространственного положения коробок для автоматизации процесса формирования заказов на складах. Материалы VI Всероссийской молодежной школы по робототехни. Общество с ограниченной ответственностью “Волгоградское научное издательство” (Волгоград), 2017. С. 9-18.</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B2"><label>B2</label><citation-alternatives><mixed-citation xml:lang="ru">Arnon D.S., Gieselmann J. P. A linear time algorithm for the minimum area rectangle enclosing a convex polygon, 1983.</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B3"><label>B3</label><citation-alternatives><mixed-citation xml:lang="ru">Billings G., Johnson-Roberson M. Silhonet: An rgb method for 6d object pose estimation IEEE Robotics and Automation Letters, 2019. V. 4(4). P. 3727-3734. DOI: 10.48550/arXiv.1809.06893</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B4"><label>B4</label><citation-alternatives><mixed-citation xml:lang="ru">Chen X., Kundu K., Zhang Z., Ma H., Fidler S., Raquel Urtasun Monocular 3d object detection for autonomous driving Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. P. 2147-2156. DOI: 10.1109/CVPR.2016.236</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B5"><label>B5</label><citation-alternatives><mixed-citation xml:lang="ru">Fan Z., Zhu Y., He Y., Sun Q., Liu H., He J. Deep learning on monocular object pose detection and tracking: A comprehensive overview ACM Computing Surveys, 2022. V. 55(4). P. 1-40. DOI: 10.1145/3524496</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B6"><label>B6</label><citation-alternatives><mixed-citation xml:lang="ru">Geiger A., Lenz P., Stiller C., Urtasun, R. Vision meets robotics: The kitti dataset The International Journal of Robotics Research, 2013. V. 32(11). P. 1231-1237. DOI: 10.1177/0278364913491297</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B7"><label>B7</label><citation-alternatives><mixed-citation xml:lang="ru">Jiang D., Li G., Sun Y., Hu J., Yun J., Liu Y. Manipulator grabbing position detection with information fusion of color image and depth image using deep learning Journal of Ambient Intelligence and Humanized Computing, 2021. V. 12. P. 10809-10822. DOI: 10.1007/s12652-020-02843-w</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B8"><label>B8</label><citation-alternatives><mixed-citation xml:lang="ru">Kim Y., Kim J., Koh J., Choi J. W. Enhanced Object Detection in Bird’s Eye View Using 3D Global Context Inferred From Lidar Point Data 2019 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019. P. 2516-2521. DOI: 10.1109/IVS.2019.8814276</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B9"><label>B9</label><citation-alternatives><mixed-citation xml:lang="ru">Kuhn H. W. The Hungarian method for the assignment problem Naval research logistics quarterly, 1955. V. 2(1‐2). P. 83-97. DOI: 10.1002/nav.3800020109</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B10"><label>B10</label><citation-alternatives><mixed-citation xml:lang="ru">Labayrade R., Aubert D., Tarel J.P. Real time obstacle detection in stereovision on non-flat road geometry through” v-disparity” representation Intelligent Vehicle Symposium, 2002. IEEE, 2002. V. 2. P. 646-651. DOI: 10.1109/IVS.2002.1188024</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B11"><label>B11</label><citation-alternatives><mixed-citation xml:lang="ru">Liu X., Xue N., Wu T. Learning auxiliary monocular contexts helps monocular 3d object detection Proceedings of the AAAI Conference on Artificial Intelligence, 2022. V. 36(2). P. 1810-1818. DOI: 10.1609/aaai.v36i2.20074</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B12"><label>B12</label><citation-alternatives><mixed-citation xml:lang="ru">Liu Y., Geng L., Zhang W., Gong Y., Xu Z. Survey of video based small target detection Journal of Image and Graphics, 2021а. V. 9(4). P. 122-134. DOI: 10.18178/JOIG.9.4.122-134</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B13"><label>B13</label><citation-alternatives><mixed-citation xml:lang="ru">Liu Z., Zhou D., Lu F., Fang J., Zhang L. Autoshape: Real-time shape-aware monocular 3d object detection Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021б. P. 15641-15650. DOI: 10.1109/ICCV48922.2021.01535</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B14"><label>B14</label><citation-alternatives><mixed-citation xml:lang="ru">Sholomov D. L. Application of shared backbone DNNs in ADAS perception systems ICMV, 2020. P. 1160525. DOI: 10.1117/12.2586932</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B15"><label>B15</label><citation-alternatives><mixed-citation xml:lang="ru">Smagina A.A., Shepelev D.A., Ershov E.I., Grigoryev A.S. Obstacle detection quality as a problem-oriented approach to stereo vision algorithms estimation in road situation analysis Journal of Physics: Conference Series. IOP Publishing, 2018. V. 1096(1). P. 012035. DOI: 10.1088/1742-6596/1096/1/012035</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B16"><label>B16</label><citation-alternatives><mixed-citation xml:lang="ru">Tekin B., Sinha S.N., Fua P. Real-time seamless single shot 6d object pose prediction Proceedings of the IEEE conference on computer vision and pattern recognition, 2018. P. 292-301. DOI: 10.1109/CVPR.2018.00038</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B17"><label>B17</label><citation-alternatives><mixed-citation xml:lang="ru">Wang H., Wang Z., Lin L., Xu F., Yu J., Liang H. Optimal vehicle pose estimation network based on time series and spatial tightness with 3D lidars Remote Sensing, 2021. V. 13(20). P. 4123. DOI: 10.3390/rs13204123</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B18"><label>B18</label><citation-alternatives><mixed-citation xml:lang="ru">Wang P. Research on comparison of lidar and camera in autonomous driving Journal of Physics: Conference Series. IOP Publishing, 2021. V. 2093(1). P. 012032. DOI: 10.1088/1742-6596/2093/1/012032</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B19"><label>B19</label><citation-alternatives><mixed-citation xml:lang="ru">Wu D., Liao M. W., Zhang W. T., Wang X.G., Bai X., Cheng W. Q., Liu W. L. You only look once for panoptic driving perception, 2022, V. 19. P. 550-562. DOI: 10.1007/s11633-022-1339-y</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B20"><label>B20</label><citation-alternatives><mixed-citation xml:lang="ru">Yu Q., Araújo H., Wang H. A stereovision method for obstacle detection and tracking in non-flat urban environments Autonomous Robots, 2005. V. 19. P. 141-157. DOI: 10.1007/s10514-005-0612-6</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B21"><label>B21</label><citation-alternatives><mixed-citation xml:lang="ru">Zhang Z., Weiss, Hanson Qualitative obstacle detection 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1994. P. 554-559. DOI: 10.1109/CVPR.1994.323881</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="B22"><label>B22</label><citation-alternatives><mixed-citation xml:lang="ru">Zhu Z., Zhang Y., Chen H., Dong Y., Zhao S., Ding W., Zhong J., Zheng S. Understanding the Robustness of 3D Object Detection With Bird’s-Eye-View Representations in Autonomous Driving Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. P. 21600-21610. DOI: 10.1109/CVPR52729.2023.02069</mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref></ref-list></back></article>