Robotika.cz » Soutěže » SubT Challenge » Virtual Tunnel

SubT Virtual Tunnel Circuit

Autor: Martin Dlouhý, 2019-08-30

Robotika Team

The DARPA Subterranean Challenge Virtual Tunnel Circuit was postponed from August till October 2019, so we decided to split the original Circuit Tunnel report into two. Update: 3/11/2019 — Robotika Virtual Videos

There are couple paragraphs related to Virtual Tunnel Circuit in the older report, in particular:

Virtual Track - starting from 9th position ... (64 days) (12th June 2019)
Freyja and Virtual Nightmare ... (21/35 days) (11th July 2019)
Virtual too easy? Let's introduce launch changes ... (11+7/25 days) (21st July 2019)
DARPA call, Freyja upside down, CYNET.ai (9+7/23 days) (23rd July 2019)
Virtual Tunnel Circuit extended by two months! (8+7+60/23 days) (24th July 2019)

Content

1st September 2019 — Restart (38 days)

The System Track of Tunnel Circuit is over, but not the virtual:

SubT Challenge Virtual Competition Teams,

We are excited to see your SubT Challenge innovations in the Virtual Tunnel Circuit competition, and know your teams are hard at work testing and developing novel solutions!

In accordance with our previous communications, the closing date for submissions of solutions for the Virtual Tunnel Circuit is Wednesday, 09 October at 11:59PM AoE (https://en.wikipedia.org/wiki/Anywhere_on_Earth).

We highly encourage you to take advantage of the three Tunnel Practice scenarios available for your use to test your entire solution workflow, to include submission through the SubT Virtual Portal. Please be sure to test early and often, and we continue to welcome your feedback to improve the Virtual Competition experience.

Awards and winners will be announced on October 30th.

We appreciate your continued contributions to the SubT Challenge. We welcome any questions you may have and we look forward to seeing your developments.

Sincerely, The SubT Challenge Team

So what to do first? Test first! … it may work without you knowing it. I run the test before we left to Pittsburgh but the log file was too big to play with:

-rw-rr 1 gpu gpu 1.6G Aug  9 12:08 /home/gpu/subt-x2-190809_094511.log

so now there is a tool to strip down only necessary streams (see PR #237). Filter to pose2d and lidar scan was only 16MB (subt-x2-190809_094511-strip.log) … much better. I expected the X2 robot to hit the edges of start area but somehow it recovered, even the entrance was visible via 2D lidar and the robot entered the tunnel! But see the „map” after few traveled meters:

IMU issue --- it does not keep stable direction any more

Somehow IMU is no longer working as it used to be?! So now I know the answer to my original question „what to do first?” — investigate what happened to IMU (due the transition from Gazebo 9 to Ignition Engine or because of recent upgrades from system track)…

p.s. there is another rule which says !!!do not trust single measurement!!! so I re-run the test:

./run.bash robotika tunnel_circuit_practice.ign robotName1:=X2 robotConfig1:=X2_SENSOR_CONFIG_4

where robotika docker image is no longer based on nkoenig/subt-virtual-testbed (it is not available now??) but on new osrf/subt-virtual-testbed described in updated tutorial. And guess what? The robot did not reach the edge of starting area this time. Hmm. They said that the position of robots is not going to be random … so what changed?

subt-x2-190809_094511-strip-cam.jpg

subt-x2-190901_192059.jpg

I still cannot understand how people can live (I mean develop robots) without logfiles?! What they do in this situation?

All I need to do is to run:

python -m subt replay subt-x2-190901_192059.log

and it fails, because in branch feature/virtual-ignition I changed command go straight 9 meters to 3 meters as this was in note in I made in logfile last time … now everything is passing fine, … it is time to download the large old logfile … done. Are you also curious?

It passes fine too! But …

SubT Challenge Ver2!
None go_straight 3.0 (0, 0, 0)
0:00:00.170243 (0.0 0.0 0.0) []
0:00:00.170243 []
maximum delay overshot: 0:00:00.120148
maximum delay overshot: 0:00:00.109536
maximum delay overshot: 0:00:00.128402
maximum delay overshot: 0:00:00.128848
maximum delay overshot: 0:00:00.107339
maximum delay overshot: 0:00:00.107657

so in the old image there was a delay more then 0.1s in the control loop, which is a good news — it could be the reason of strange IMU behavior (not properly synced with rest of the system), so maybe newer docker image is better, faster?!

How long it took to analyze? 5 minutes? (yes, the slowest part was to copy 1.6GB from one machine to another)

So let's try 4 meters …

Many repeated collisions including tunnel entrance

Well it will be a „lot of fun” — now it generated set of collisions and X2 tried to recover. But it collided even with „tunnel air” when it tried to enter the tunnel. Time to check accelerometers!

the complete log file subt-x2-190901_195952.log (400MB)
video how it ended: https://youtu.be/N1Om4LO73uE

3rd September 2019 — Accelerometer and local planner (36 days)

The progress yesterday was small, but at least the logfile confirmed source of false collision detection (see reported issue #166). Shortly, whenever robot X2 wants to go or stop its acceleration was larger (after removal of G-force) then original threshold 12m/s^2. This is probably bug, because the robot has limit 7m/s^2, but in meantime I increased the threshold to 18m/s^2 so it hits the obstacles harder, probably.

After the post Wolfgang wrote me notice: You may already have noticed that the angular velocity of X2 has changed again. The change in the last version of the docker file leaded to silly dancing (turning right and turning left) of the X2 robot in our tests :-( … hmm see already „resolved” issue #161 Rotation Behavior Dependent on Robot Orientation … so based on robot orientation the angular speed is different?? Sigh, system track was easy.

One of the problems with current code is accumulated delay from local planner causing crash at the end. The Practice Tunnel 1 contains many heaps of stones and boards and not all is visible on 2D lidar. Also there are only narrow alleys remaining.

Heap of stones

Loose stones

I also switched for convenience to X2 configuration #3 with QVGA camera to speedup cycle test and reduce log size but then the first detected artifact is not as nice:

The first detected artifact

Yes, another reason was that even the JPEG images are sometime very large:

Connected by ('127.0.0.1', 54808)
Exception in thread Thread-17:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/developer/subt_ws/osgar/osgar/drivers/rosmsg.py", line 267, in run
    self.bus.publish('image', parse_jpeg_image(packet))
  File "/home/developer/subt_ws/osgar/osgar/drivers/rosmsg.py", line 90, in parse_jpeg_image
    assert size < 230467, size  # expected size for raw image during experiment
AssertionError: 232056

so maybe the ROS and Ignition uses different loss factor? Time to remove this old assert (used to be verification that I am receiving compressed images).

5th September 2019 — Stone Age (34 days)

It looks like the organizers like stones — they are everywhere :-(. And sure enough they are not too big to be visible by 2D lidar:

deadly stone ...

It is possible to detect collision from accelerometers, but at the moment (I mean current simulation docker) there are too many false alarms. I tried practise tunnel 01 and 02 and very soon you approach the heap of stones … and if you try virtual STIX you hit the „virtual wall” even before entering the tunnel!

0:00:03.057975 go_straight 7.0 (speed: 0.5) (0.0, 0.0, -0.7923794804054256)
0:00:03.151615 Collision! [18.273, 0.004, 18.782] reported: False
0:00:03.176237 Collision! [25.47, 0.0, 19.8] reported: False
0:00:38.168127 Collision! [-19.87, -0.011, 21.035] reported: True

Sigh. There is one good news — so far the robot (only one, so I do not have the problem #168 reported yesterday with deploying multiple robots yet) is placed at approximately (-5, 5, 0), so it is enough to code:

self.turn(math.radians(-45))
self.go_straight(7.0)

to enter the tunnel. Yeah, I know that it is not very sophisticated, but first I need to get the thing working and score at least one „fake” point on the cloud simulation. Somehow this simple goal is still far far „in the clouds”.

Maybe one more picture — attempt to enter virtual STIX tunnel:

Situation before entering

Many false alarms more than meter from the wall

6th September 2019 — Everything under control! (33 days)

p.s. well, maybe I should add some note, what you see and what you should see …

This video was generated from hacked osgar.tools.lidarview with extra few lines to save pygame screen:

pygame.image.save(screen, "tmp.jpg")

and OpenCV wrapper to save AVI file. I even uploaded the subt-x2-190906_185628.log (175MB) logfile, so everyone can replay it. Note, that the development is currently in git feature/virtual-ignition branch. The lidarview is „universal” and it was used also for analysis of OSGAR robots in Pittsburgh — this means the parameters could be bit complicated so I added also simple helper subt/script/virtual-view.bat, which contains only this line:

python -m osgar.tools.lidarview –pose2d app.pose2d –lidar rosmsg_laser.scan
    –camera rosmsg_image.image –keyframes detector.artf %1

So what you see? In the top left corner it ROS image from camera, green points are ROS LIDAR scan points and finally orange is robot 2D pose with orientation and its history/trace. The scale is in the left bottom corner (1 meter). Note stream name app.pose2d, so it is already fused position from IMU and odometry.

Now what you should see and you definitely do not … is that if you combine laser scan with yaw angle from IMU on reasonable flat surface (this was worldName:=simple_tunnel_01 added in the latest release 2019-09-05) the direction should be aligned! No wild rotations!

What could be wrong? One obvious possibility is timing — you simply have LIDAR scan and IMU yaw data from different times. So that is how I came to issue #170 - Irregular X2/odom — timing of IMU gyro and accelerometer is perfect but odometry is two to four time slower. The OSGAR log file contains both: internal streams (used in lidarview for example) as well as raw communication with „external devices” (in this case ROS). You can replay processing of ROS IMU:

python -m osgar.replay –module rosmsg_imu subt-x2-190906_185628.log

or ROS odometry:

python -m osgar.replay –module rosmsg_odom subt-x2-190906_185628.log

and print ROS timestamps in nanoseconds. For IMU you get differences 4000000ns, which is 4ms … so the update is 250Hz? And Addisu Z. Taddese wrote, that odometry should be on 50Hz, so I should have 5 times more messages from IMU than from odometry … and I don't! Moreover the irregularity is really bad:

python -m subt replay e:\osgar\logs\subt2\virtual\subt-x2-190906_161345.log
…
0:00:10.311060 acc
0:00:10.316274 rot
0:00:10.316339 acc
0:00:10.319772 rot
0:00:10.319824 acc
0:00:10.321231 pose2d
0:00:10.321453 pose2d
0:00:10.321734 pose2d
0:00:10.323033 rot
0:00:10.323096 acc
0:00:10.326389 rot
0:00:10.326485 acc
0:00:10.331349 rot
0:00:10.331419 acc

so here are 3 times pose2d messages in row and none accelerometer or rotation in between. Sigh. I think the cause is clear — it is necessary to down-sample IMU to „reasonable” 50Hz and reduce the internal traffic. Ideally I would have „block update” of all sensors, but from ROS I have only independent TCP connections …

8th September 2019 — 250Hz IMU bottleneck(31 days)

OK, the video from simple_tunnel_01 was not that funny … but it exposed interesting case (#172): the simulation could be fast enough that the bottleneck becomes the TCP communication! In my case it is IMU, which sends updates on 250Hz. It is delayed and this is especially visible on the unit test tunnels, i.e. fast to simulate. What happened was that in 2 minutes of simulation my IMU data were 10 seconds old! Somehow pose2d from normal odometry is much more reliable then the position computed from IMU:

There and back with odometry only

with delayed IMU

delayed IMU without map

What you see above is that when the robot turned by 180 degrees IMU did not notice it for another 10 seconds (the robot continued straight) so that is why the map is so bad.

What to do if your orientation is 10s old? You can stop and wait 10s. Unfortunately the buffer is not flushed, so it will be still 10s old. We can use the „fall-back” to version 0 — turn 180 degrees and follow the other wall for already traveled distance. With this strategy we could at least return to the tunnel entrance (simple_tunnel_01). It worked for speed 0.5m/s, but did not work for 2m/s. Maybe if I disable LocalPlanner again … ???

Attempt with tunnel_circuit_practice_01 ended up on the robot's roof for speed 1m/s, and it looks like slow down to 0.5m/s does not help too.

p.s. well, it is not only the 10s old orientation, it is also info, that 10s ago you probably hit something … no wonder robot is upside down :-(

10th September 2019 — Robotika is leading the board! (29 days)

(with zero points, practise tunnel 01)

There is new design of SubT Challenge Virtual Testbed Portal and also the Leader Boards are active now! And believe it or not Robotika is on the 1st place! … yeah, just kidding, … almost …

I wanted to make some bad jokes about PR article Autonomní roboti z ČVUT vyhráli v USA soutěž pořádanou agenturou DARPA (Autonomous robots from CTU won the U.S. contest organized by DARPA). I am glad that they are more proud to beat us then to be on the 3rd place next to Carnegie Mellon and NASA/JPL/Caltech/MIT. … so we are winning similarly as we were loosing with 8th points on the last position in Tunnel Qualification. This time it is 0 points, just proof that we tried. Potentially a long way to score at least one point!

Zero company

18th September 2019 — Once upon a time … (21 days)

Believe it or not, I still did not manage to get a single point after a week of experiments! Yes, it is true, that in meantime we organized Robotour in Deggendorf, but still … something is very wrong and I am sure that it is some very stupid detail. :-( See issue #186.

So where are we? First of all we have back IMU at 50Hz (#172), and it is locally as nice as it used to be. Yeah, I still have to review the Collisions detected by accelerometer if they can be reduced, but later. The main goal is to get 1 point, version 0, as always.

What I know so far? Nate mentioned important „detail”: The robot must get close enough to an artifact for it to be loaded before you can attempt to score with the artifact. You can see parts of the tunnel loading as a robot moves through the environment. Attempting to report an artifact that was never loaded will fail. An artifact only has to be loaded once. … so that's why debug routine just entering the tunnel and reporting previously detected (from older simulation) artifact did not work — so there is no such a fast „development cycle”. You have to wait couple of minutes to see that it failed again. OK, it definitely makes sense for the competition.

Then there is an issue with the keyword defining Base Station, it used to be BaseStation and now it is base_station. Recommended solution is subt::kBaseStationName.

I also reviewed the artifacts — they are different when compared to qualification times and also re-indexed.

But there are people trying to help me. Arthur directed me to subt_hello_world development branch and Alfredo added three terminal solution:

Terminal 1

$ mkdir -p ~/subt_testbed && cd ~/subt_testbed
$ wget https://bitbucket.org/osrf/subt/raw/tunnel_circuit/docker/run.bash
$ chmod +x run.bash
$ ./run.bash osrf/subt-virtual-testbed:cloudsim_sim_latest cloudsim_sim.ign
                                robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG1

Terminal 2

$ cd ~/subt_testbed/
$ ./run.bash osrf/subt-virtual-testbed:cloudsim_bridge_latest
                                robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG1

Terminal 3

$ cd ~/subt_testbed/
$ ./run.bash osrf/subt-virtual-testbed:subt_solution_latest

… and it worked in the sense that it reported artifact:

[ INFO] [1568782543.383065601]: Starting seed competitor

[ INFO] [1568782543.398386348]: Waiting for /clock, /subt/start, and /subt/pose_from_artifact
[ INFO] [1568782543.639311235, 47.808000000]: Using robot name[X1]

[ INFO] [1568782543.781596453, 47.832000000]: Sent start signal.
[ INFO] [1568782565.031902152, 59.608000000]: Arrived at entrance!
[ INFO] [1568782565.269722166, 59.708000000]: Message from [base_station] to [X1] on
                                                                         port [4100]:
 [report_id: 1
artifact {
  type: 4
  pose {
    position {
      x: -8.1
      y: 37
      z: 0.004
    }
  }
}
submitted_datetime {
  sec: 4798503
  nsec: 310287108
}
sim_time {
  sec: 59
  nsec: 608000000
}
run: 1
report_status: "scored"
score_change: 1
]

but to be honest I do not have a clue how this is done, where is the source, even which tunnel is this?

Following left wall

Following right wall

Do you see the QVGA artifacts?

21st September 2019 — Fake it until you make it … (18 days)

... but it will not help you anyway!

The plan last night was simple: insert our simple tunnel exploration for 2 minutes before subt_seed example and it should be enough to score 1 point as it worked locally:

run: 1
report_status: "scored"
score_change: 1

but not on AWS (Amazon Web Services):

Hello md,

Your Tunnel Practice 1 simulation with id 15de7ba1-e825-4f39-b8d2-34f427e8416c
 has finished successfully.
Summary

{
  "sim_time_duration_avg": 0,
  "sim_time_duration_std_dev": 0,
  "real_time_duration_avg": 0,
  "real_time_duration_std_dev": 0,
  "model_count_avg": 0,
  "model_count_std_dev": 0
}

Regards,

Open Robotics Team

ver0 simulation, ROS log and new leader

You can even download ROS log, which looks very promising, except it was almost empty:

empty log ROS files

Now my impression is that I have to „bundle” the bridge to my docker, maybe … so far still 0 points. :-(

22nd September 2019 — Double Fake! (17 days)

If fake does not work then double it! I wonder how the simulation of seed2 ends on AWS — it is not necessary „fake attempt” to score 2 points on Tunnel Practice 01 Leader Board, but to have a proof (?) that scoring behaves differently then expected (#186). Shortly you do not have to explore the tunnel, but … maybe this „feature” is intended for the final Tunnel Circuit world??

What I did was that I guessed position of Backpack artifact from this trace:

Guess position of backpack?

The first attempt failed but then the more proper measurement reduced the position error and it passed. The docker image seed2 uses original code but reports two artifact … and it should fail because they are reported in the tunnel entrance without any exploration. We will see.

On the other hand I have also a good news — with some luck (but repeated twice) X2 managed to navigate among the stones to the first Fire Extinguisher and report its valid position.

First proper point

Now there is only a small detail to put these two parts together and get some real points in AWS simulation. Here is the logfile subt-x2-190922_182354.log (213MB) and video (QVGA, X2 config #3, sorry).

python3 -m subt run subt/subt-x2.json –side left –timeout 128 –walldist 0.9

23rd September 2019 — Ver1 = DRILL (16 days)

Today (45min to midnight) update will be short. The fake from yesterday scored 2 points, but not much excitement there (yeah, we are on the top but it will not work anywhere else). I did some tests in practice tunnel 02 and the robot turns around at the first blockage:

Local Planner giving up

For some parameters it tries to enter the smaller hole on the left, but then it turns 180 degrees or for larger parameters it turns immediately.

I wondered why Kevin/Coordinated Robotics selected Simple Tunnel 02 for the deployment test and now I know — it is nice tunnel with artefacts near the entrance. There is drill and survivor close to the start:

Drill

Survivor

The detectors are not tweaked yet, so Survivor is not recognized at all and Drill was reported as Valve (small red thing). Fixed at least the drill … I should say hacked. The same holds for camera resolution — half a year ago we used large resolution camera, now I am lazy to copy larger log files so I use QVGA instead (for now). Logfile is here: subt-x2-190923_193000.log (286MB), and QVGA video is here.

The simulation with ver1 is running now, but I guess it will fail again …

24th September 2019 — Ver2 = ROS_MASTER_URI and ROS_PI (15 days)

Well, there was a micro progress (#199) that docker on AWS does not use local IP address 127.0.0.1. So now there is at least indication that osgar tried to subscribe and publish basic topics:

[rosmaster.master][INFO] 2019-09-24 10:05:48,127: +SUB [/X2/imu/data]
  /osgar_node http://10.36.176.3:8000
[rosmaster.master][INFO] 2019-09-24 10:05:48,144: +SUB [/X2/front_scan]
  /osgar_node http://10.36.176.3:8000
[rosmaster.master][INFO] 2019-09-24 10:05:48,160: +SUB [/X2/front/image_raw/compressed]
  /osgar_node http://10.36.176.3:8000
[rosmaster.master][INFO] 2019-09-24 10:05:48,177: +SUB [/X2/odom]
  /osgar_node http://10.36.176.3:8000
[rosmaster.master][INFO] 2019-09-24 10:05:48,192: +PUB [/X2/cmd_vel]
  /osgar_node http://10.36.176.3:8000
[rosmaster.master][INFO] 2019-09-24 10:05:48,220: publisherUpdate[/X2/cmd_vel]
  -> http://10.36.176.2:33337/ ['http://10.36.176.3:43523/', 'http://10.36.176.3:8000']
[rosmaster.master][INFO] 2019-09-24 10:05:48,221: publisherUpdate[/X2/cmd_vel]
  -> http://10.36.176.2:33337/ ['http://10.36.176.3:43523/', 'http://10.36.176.3:8000']:
  sec=0.00, result=[1, '', 0]

… but that is all :-(.

Zero points 4x

I am not sure if it is OK that all four teams in the simplest environment have 0 points?!

25th September 2019 — Ver3 = std2ros and ASCII art (14 days)

Another run failed, again … so let's see if any of prepared „traps” contains anything. The changes were:

print robot position via ROS_INFO from the modified subt_seed example
remove original publish of topic cmdVel to avoid collision in commands
use simple std2ros.py to re-direct out Python3 stdout to ROS:

import sys
import rospy

rospy.init_node('mdtalker', anonymous=True)
for i, line in enumerate(sys.stdin):
    rospy.loginfo(str(i) + ': ' + line.strip())

Will it work? Or all problems are on stderr?

Before I open „the treasure box” (ver3-76640b06-334f-47eb-8dd9-b09202a14ca8.tar.gz and ver3-76640b06-334f-47eb-8dd9-b09202a14ca8-X2.tar.gz) I would mention some fun I had today. I am running simulations remotely (I mean really remotely, via web, RasPI, MapFactor, GPU sever, docker) and what I have most of the time is only CLI (command line interface). I tried to run 5 minutes simulation of X2 at speed 1m/s and … it was hard to guess what the robot explored. So I wrote in 15 minutes small osgar.tools.asciiart

python -m osgar.tools.asciiart subt-x2-190925_104718.log –step 3

             XXXXXXXXXXXX
            XX          XX
            X            XX
            X             X
            X             X
            X             X
            X             X
            X             X
            X             X
            X             X
            X             X
            X             X
XXXXXXXXXXXXX             X
 XXXXXXXXXXXX             X
 XX         X             X
 XX         X             X
            X             X
            X             X
            X             X
            X             X
            X             X
            X             X
            X             X
            X             X
            X            XX
            XX           X
             XXXXXXXXXXXXX

It looks like Simple Tunnel 02 is relatively small circle

OK, open the box!

[rosmaster.master][INFO] 2019-09-25
09:00:13,508: +SUB [/X2/imu/data] /osgar_node http://10.37.192.3:8000
09:00:13,527: +SUB [/X2/front_scan] /osgar_node http://10.37.192.3:8000
09:00:13,542: +SUB [/X2/front/image_raw/compressed] /osgar_node http://10.37.192.3:8000
09:00:13,562: +SUB [/X2/odom] /osgar_node http://10.37.192.3:8000
09:00:13,577: +PUB [/X2/cmd_vel] /osgar_node http://10.37.192.3:8000
09:00:13,590: publisherUpdate[/X2/cmd_vel] -> http://10.37.192.2:36357/
  ['http://10.37.192.3:8000']
09:00:13,603: +PUB [/rosout] /mdtalker_72_1569427213447
  http://10.37.192.3:42793/
09:00:13,605: publisherUpdate[/X2/cmd_vel] -> http://10.37.192.2:36357/
  ['http://10.37.192.3:8000']: sec=0.02, result=[1, '', 0]
09:00:13,605: publisherUpdate[/rosout] -> http://10.37.192.2:44631/
  ['http://10.37.192.2:34241/', 'http://10.37.192.2:41203/',
  'http://10.37.192.2:36399/', 'http://10.37.192.2:42761/',
  'http://10.37.192.2:46663/', 'http://10.37.192.2:36357/',
  'http://10.37.192.2:43891/', 'http://10.37.192.2:39637/',
  'http://10.37.192.2:37481/', 'http://10.37.192.2:44531/',
  'http://10.37.192.2:41603/', 'http://10.37.192.2:46139/',
  'http://10.37.192.3:33947/', 'http://10.37.192.3:42793/']
09:00:13,607: publisherUpdate[/rosout] -> http://10.37.192.2:44631/
  ['http://10.37.192.2:34241/', 'http://10.37.192.2:41203/',
  'http://10.37.192.2:36399/', 'http://10.37.192.2:42761/',
  'http://10.37.192.2:46663/', 'http://10.37.192.2:36357/',
  'http://10.37.192.2:43891/', 'http://10.37.192.2:39637/',
  'http://10.37.192.2:37481/', 'http://10.37.192.2:44531/',
  'http://10.37.192.2:41603/', 'http://10.37.192.2:46139/',
  'http://10.37.192.3:33947/', 'http://10.37.192.3:42793/']:
  sec=0.00, result=[1, '', 0]
09:00:13,609: +SUB [/clock] /mdtalker_72_1569427213447 http://10.37.192.3:42793/
09:00:13,611: +SERVICE [/mdtalker_72_1569427213447/get_loggers]
  /mdtalker_72_1569427213447 http://10.37.192.3:42793/
09:00:13,613: +SERVICE [/mdtalker_72_1569427213447/set_logger_level]
  /mdtalker_72_1569427213447 http://10.37.192.3:42793/

Isn't it a problem that osgar_node does not have /rosout? Otherwise I see both osgar_node as well as mdtalker, so let's continue …

rosout.log has 8MB and it looks interesting!

0.000000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/
  src/subt_seed_node.cc:103(Controller)] [topics: /rosout] Waiting for /clock,
  /subt/start, and /subt/pose_from_artifact
375.900000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/
  src/subt_seed_node.cc:111(Controller)] [topics: /rosout] Using robot name[X2]

Whooow! It took more then 6 minutes before it started?! (#202)

375.908000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/
  src/subt_seed_node.cc:146(Update)] [topics: /rosout] Sent start signal.
375.944000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/
  src/subt_seed_node.cc:196(Update)] [topics: /rosout] MD robot pose -6 5 dist=61

Another one:

389.700000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/
src/subt_seed_node.cc:196(Update)] [topics: /rosout] MD robot pose -6 5.00004 dist=61.0004

Well, it is

double dist = pose.position.x * pose.position.x + pose.position.y * pose.position.y;

so it is actually squared distance!

5421.404000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/
src/subt_seed_node.cc:196(Update)] [topics: /rosout] MD robot pose -5.98597 4.99567 dist=60.7885

so the robot did not move much, but surprisingly the position was not stable?

No message from mdtalker :-(

p.s. the team Scientific Systems Company has 3 points and it is the leader again …

p.s.2 the osgar.tools.lidarview map looks much nicer then the ASCII art above …

Simple Tunnel 02

26th September 2019 — Ver4 = wait_for_sensors and 0 points (13 days)

Bad, bad, bad :-(

Yesterday, during our regular SubT telco was created a small script wait_for_sensors.py, which is based on Python2 ROS and waits for all necessary sensors. If you expect that it helped then you are wrong. BTW note 5 teams in Simple Tunnel 02, the deadline is approaching …

ROS master log (filter for listener):

+PUB [/rosout] /listener_51_1569448646286 http://10.36.32.3:39067/
+SUB [/clock] /listener_51_1569448646286 http://10.36.32.3:39067/
+SERVICE [/listener_51_1569448646286/get_loggers] 
   /listener_51_1569448646286 http://10.36.32.3:39067/
+SERVICE [/listener_51_1569448646286/set_logger_level] 
   /listener_51_1569448646286 http://10.36.32.3:39067/
+SUB [/X2/imu/data] /listener_51_1569448646286 http://10.36.32.3:39067/
+SUB [/X2/front_scan] /listener_51_1569448646286 http://10.36.32.3:39067/
+SUB [/X2/front/image_raw/compressed] /listener_51_1569448646286 http://10.36.32.3:39067/
+SUB [/X2/odom] /listener_51_1569448646286 http://10.36.32.3:39067/
-SUB [/clock] /listener_51_1569448646286 http://10.36.32.3:39067/
-SUB [/X2/front/image_raw/compressed] /listener_51_1569448646286 http://10.36.32.3:39067/
-SUB [/X2/front_scan] /listener_51_1569448646286 http://10.36.32.3:39067/
-SUB [/X2/imu/data] /listener_51_1569448646286 http://10.36.32.3:39067/
-SUB [/X2/odom] /listener_51_1569448646286 http://10.36.32.3:39067/
-PUB [/rosout] /listener_51_1569448646286 http://10.36.32.3:39067/
-SERVICE [/listener_51_1569448646286/get_loggers]
   /listener_51_1569448646286 rosrpc://10.36.32.3:43559
-SERVICE [/listener_51_1569448646286/set_logger_level]
   /listener_51_1569448646286 rosrpc://10.36.32.3:43559

… now I guess that + is creation and - removal of published, subscribed or queried topics. This looks OK.

ROS master log for mdtalker (utility forwarding stdout to ROS)

+PUB [/rosout] /mdtalker_74_1569448647072 http://10.36.32.3:36669/
+SUB [/clock] /mdtalker_74_1569448647072 http://10.36.32.3:36669/
+SERVICE [/mdtalker_74_1569448647072/get_loggers]
   /mdtalker_74_1569448647072 http://10.36.32.3:36669/
+SERVICE [/mdtalker_74_1569448647072/set_logger_level]
   /mdtalker_74_1569448647072 http://10.36.32.3:36669/

… this also looks OK, request output topic and ask about verbosity level.

Finally osgar_node:

+SUB [/X2/imu/data] /osgar_node http://10.36.32.3:8000
+SUB [/X2/front_scan] /osgar_node http://10.36.32.3:8000
+SUB [/X2/front/image_raw/compressed] /osgar_node http://10.36.32.3:8000
+SUB [/X2/odom] /osgar_node http://10.36.32.3:8000
+PUB [/X2/cmd_vel] /osgar_node http://10.36.32.3:8000

… good as expected, without any termination, so the Python3 program was probably killed with docker?

Empty roslaunch-sim-f94997ad-ae3a-4260-9a71-4835c0196b6d-comms-rbt1-52.log.

0.000000000  Node Startup
0.000000000 INFO /tf_world_static [/tmp/binarydeb/ros-melodic-tf2-ros-0.6.5/
  src/static_transform_broadcaster_program.cpp:142(main)] [topics: /rosout, /tf_static]
  Spinning until killed publishing world to simple_tunnel_02
0.000000000 INFO /mdtalker_74_1569448647072 [std2ros.py:10(std2ros)] [topics: /clock, /rosout]
  --—- BEGIN --—-
416.060000000 INFO /X2/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/include/
  ros1_ign_bridge/factory.hpp:137(Factory::ros1_callback)] [topics: /rosout]
  Passing message from ROS1 geometry_msgs/Twist to Ignition ignition.msgs.Twist
  (showing msg only once per type
0.000000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed/
  src/subt_seed_node.cc:103(Controller)] [topics: /rosout] Waiting for /clock,
  /subt/start, and /subt/pose_from_artifact
416.556000000 INFO /subt_solution [/home/developer/subt_solution/src/
  subt_seed/src/subt_seed_node.cc:111(Controller)] [topics: /rosout] Using robot name[X2]

Sigh … it looks like all ROS nodes have their internal local time and then they are put into one rosout.log file without sorting.

There is no single record of listner (!!!), i.e. the intermediate wait_for_sensors ROS code waiting for data via ROS_INFO. (this is some indication that rewrite our code to Python2 ROS would not help much :-(

What is confusing on the other hand how could mdtalker report something if starts after listener?!

27th September 2019 — Invisible Enemy = Ver5, 6, 6b, 7, 8, 9, 10? (12 days)

Status 27/9

This is current status. Not much change on the Simple Tunnel 02/Leader Board but you can see the right side — last 24 hours robotika released not one but many docker images. Without any luck. So what is going on?

First of all I am convinced that there is a bug in the simulation-bridge-solution triple docker (yeah, in every code is at least one bug, right?). As a proof I consider run ver6 and with identical docker ver6b. In order to handle 6 minutes initialization of the simulation monster I wrote extra pure ROS code (see wait_for_sensors.py) which waits for each topic, i.e. for each sensor.

I must admit that I actually almost like the Python ROS code:

def wait_for(topic, topic_type):
    rospy.loginfo('Wait for ' + topic)
    rospy.wait_for_message(topic, topic_type)
    rospy.loginfo('Done with ' + topic)

def wait_for_sensors():
    rospy.init_node('mdwait', anonymous=True)
    wait_for_master()
    rospy.loginfo('--—- mdwait BEGIN --—-')
    wait_for('/X2/imu/data', Imu)
    wait_for('/X2/front_scan', LaserScan)
    wait_for('/X2/front/image_raw/compressed', CompressedImage)
    wait_for('/X2/odom', Odometry)
    wait_for('/X2/battery_state', BatteryState)
    rospy.loginfo('--- mdwait END ---')

… but it does not work and YOU tell me how you want to debug it?! And who will fix it in the last few remaining days on the side of organizers? I suppose that in ROS you can launch several nodes independently and that they do not have to be specified in a single launch file??

roslaunch subt_seed x1.launch &
cd osgar
python subt/wait_for_sensors.py
python3 -m subt run subt/subt-x2.json –side left –walldist 0.75 –timeout 180 
  –note "try to visit artifact and return home" 2>&1 | python subt/std2ros.py

This was the run_solution.bash code … so subt_seed starts first and in parallel starts wait_for_sensors.py.

This was the end of "ver6":

396.284000000 INFO /mdwait_71_1569502355368 [wait_for_sensors.py:24(wait_for)]
  [topics: /clock, /rosout] Wait for /X2/front/image_raw/compressed

… is was the first and also the last mdwait rosout.log message!!! Obviously in this situation is ROS_INFO useless if you do not get messages at the beginning.

Just for comparion ver6b (identical docker, alternative run) had this output:

429.896000000 INFO /mdwait_71_1569527167188 [wait_for_sensors.py:24(wait_for)]
  [topics: /clock, /rosout] Wait for /X2/odom
429.996000000 INFO /mdwait_71_1569527167188 [wait_for_sensors.py:26(wait_for)]
  [topics: /clock, /rosout] Done with /X2/odom
429.996000000 INFO /mdwait_71_1569527167188 [wait_for_sensors.py:24(wait_for)]
  [topics: /clock, /rosout] Wait for /X2/battery_state
430.076000000 INFO /mdwait_71_1569527167188 [wait_for_sensors.py:26(wait_for)]
  [topics: /clock, /rosout] Done with /X2/battery_state
430.080000000 INFO /mdwait_71_1569527167188 [wait_for_sensors.py:38(wait_for_sensors)]
  [topics: /clock, /rosout] --- mdwait END ---
431.580000000 INFO /X2/ros1_ign_bridge_twist [/tmp/ros1-ign-bridge-release/
  include/ros1_ign_bridge/factory.hpp:137(Factory::ros1_callback)]
  [topics: /rosout] Passing message from ROS1 geometry_msgs/Twist to
  Ignition ignition.msgs.Twist (showing msg only once per type
0.000000000 INFO /mdtalker_92_1569527168132 [std2ros.py:10(std2ros)]
  [topics: /clock, /rosout] --—- BEGIN --—-

… so in this run it continued, got a message from each topic and even started Python3 code, which had redirected output to std2ros … and that was it.

So what to do if even this does not work? The proposal on forum was to put all nodes into single launch file, but would it work (especially with later OSGAR integration)? With PavelJ we discussed option to use ZeroMQ to provide alternative proxy between ROS and OSGAR code. What seems to work is single C++ ROS node (subt_seed_node). Conclusion? I guess that the only solution, if we still want to use our OSGAR solution, is to code proxy in C++ as single ROS node.

I must admit that I really hate ROS. … it took me several hours to dig out how to serialize IMU ROS message. It changed between ROS versions and because new code is automatically generated (as I found out later) then all I have found was obsolete sensors_msg/Imu.h. Sigh. This is the solution:

void imuCallback(const sensor_msgs::Imu::ConstPtr& msg)
{
  ros::SerializedMessage sm = ros::serialization::serializeMessage(*msg);
  zmq_send(g_responder, sm.buf.get(), sm.num_bytes, 0);
}

On the other hand OSGAR Python code to interface ZeroMQ took say 15 minutes, worked on Windows and then on Linux without problem. So now I have some serialized IMU data and I (naively) hope, that it will be the same format what I was getting over TCP from ROS.

29th September 2019 — ZeroMQ = Ver10, Ver11? Finally 2 points! (10 days)

Just short report — I dropped old ROS Proxy and replaced it by ZeroMQ interface (two pipefiles). Again I was angry with inability to deserialize ROS message geometry_msgs::Twist this time. Sigh, but this time I throw it away and used just string with two float numbers speed and angular_speed. Many rude words … if somebody is convinced that more templates and more streams you add to send a few bytes … ROS is probably not for me.

The simulation is running now. Again it was not catching up so I had to down sample lidar updates to 10Hz, which is not good, but I really would like to see at least one point in Simple Tunnel 02. Then it will be time for profiling, maybe. Or fixes to distinguish fire extinguisher, drill and backpack. Or two robots. Or dangerous tilt. Or integration of stones avoidance based on camera. Or … tons of things … 10 days … easy. Let's go to sleep (after I complete docker image, upload it and start simulation ver). You can check the results yourselves.

p.s. the next morning …

Finally 2 points!

… so next step is probably cleanup and sending stdout via ZeroMQ to learn what is going on there anyway …

778.704000000 INFO /subt_solution [/home/developer/subt_solution/src
/subt_seed/src/subt_seed_node.cc:232
(CommClientCallback)] [topics: /rosout, /X2/cmd_vel] Message from 
[base_station] to [X2] on port [4100]:
 [report_id: 1
artifact {
  type: 2
  pose {
    position {
      x: 22.79
      y: 2.76
    }
  }
}
submitted_datetime {
  sec: 959
  nsec: 415188345
}
sim_time {
  sec: 778
  nsec: 540000000
}
run: 1
report_status: "scored"
score_change: 1
]
778.708000000 INFO /subt_solution [/home/developer/subt_solution/src
/subt_seed/src/subt_seed_node.cc:232
(CommClientCallback)] [topics: /rosout, /X2/cmd_vel] Message from 
[base_station] to [X2] on port [4100]:
 [report_id: 2
artifact {
  type: 7
  pose {
    position {
      x: 43.45
      y: 42.83
    }
  }
}
submitted_datetime {
  sec: 959
  nsec: 416012212
}
sim_time {
  sec: 778
  nsec: 540000000
}
run: 1
report_status: "scored"
score_change: 1
]

30th September 2019 — Ver13 = Two Robots (9 days)

Tonight I was testing combination of two robots — old good fellows X2L and X2R, i.e. robot X2 following left and right wall. It was necessary to extend the C++/Python3 communication for origin pose of the robot and after discovery, that the second robot starts at:

Origin: [-6.0, 3.0, 0.0615]
Use right wall: False

I started to use name to distinguish on which side the robot should move:

Origin: [-6.0, 3.0, 0.0615] X2R
Use right wall: True

(originally I wanted to use if it is on the left or right side of the tunnel center axis, but both robots start on the left side)

Do you see the 2nd robot?

I am not sure about the results yet — both robots should detect drill in the entrance tunnel and then X2R should fail to recognize extinguisher … to be fixed tomorrow. X2L should work as former X2 except the rotation to tunnel center and distance traveled is properly computed from robot origin.

p.s. it happened to me couple times that there were two Python3 controllers running in parallel and crashing the robot … I am kind of surprised that they co-exists …

1st October 2019 — Ver14 = HD camera, debug_artf and log strip (8 days)

Today it is so far just maintenance — I switched to X2_SENSOR_CONFIG_4 with HD camera. I liked QVGA due to reasonably small log files, but it was hard even for me to recognize the few-pixel-objects. :-(

So now even for 100 second exploration I get 1.7GB large log file (for each robot) while for the QVGA it was around 200MB. In order to „survive” I am now trying to combine osgar.tools.strip to remove all communication with ROS and rosmsg.image stream (BTW the size of JPEG images is like double to original Gazebo, but in both cases it is "quality 80", so I do not know what else changed). On the other hand I introduced detector.debug_artf stream which contains all pictures selected as potential artifact ready for classification. That is the part I want to fix tonight, i.e. reliably distinguish fire extinguisher from drill. It should be trivial but was not in QVGA so we will see from logfiles with HD camera. The final logfile has 12MB, which could be even e-mail attachment … so far so good.

python3 -m osgar.tools.strip zmq-subt-x2-191001_101528.log –remove receiver.raw rosmsg.image        

-rw-r -r- 1 developer developer 167M Oct  1 06:51 zmq-subt-x2-191001_044323.log
-rw-r -r- 1 developer developer 1.6G Oct  1 07:05 zmq-subt-x2-191001_045510.log
-rw-r -r- 1 developer developer 1.7G Oct  1 07:21 zmq-subt-x2-191001_051055.log
-rw-r -r- 1 developer developer  12M Oct  1 15:08 zmq-subt-x2-191001_101528-strip.log
-rw-r -r- 1 developer developer 1.7G Oct  1 12:26 zmq-subt-x2-191001_101528.log

I would also mention simulation speed change on AWS, when changing from one robot to two. Originally it was faster then reality, for example for ver12 I got mail with simulation times:

{
  "sim_time_duration_avg": 3600,
  "sim_time_duration_std_dev": 0,
  "real_time_duration_avg": 2067,
  "real_time_duration_std_dev": 0,
  "model_count_avg": 1,
  "model_count_std_dev": 0
}

i.e. 1 hour took around 35 minutes. For ver13 (two robots) it was closer to reality:

{
  "sim_time_duration_avg": 3600,
  "sim_time_duration_std_dev": 0,
  "real_time_duration_avg": 4020,
  "real_time_duration_std_dev": 0,
  "model_count_avg": 2,
  "model_count_std_dev": 0
}

i.e. that 1 hour took extra 7 minutes = good! Now I also have a guess what model_count_avg means … it is number of deployed robots, right?

p.s. joke of the day — Is that with the latest Docker images from yesterday? is a sentence I noticed in issue #216 — WHAT?! There has been new release and nobody knows about it?!

p.s.2 now we can run 10 simulations in parallel, see #214

p.s.3 the GPU server just died today (disk full) so remote access failed … here is bad documentation photo at least the minute of death is clear … the clock stopped.

Disk Full = Dead

p.s.4 Stripped logfiles with detector.debug_artf stream:

2nd October 2019 — Ver15? Red Noise (7 days)

It turned out that the artifact classification failed primarily due to red noise — many red dots in the image which I originally have not noticed. That was also reason (after machine recovery) why I did not publish ver14 last night.

Black and white Drill

original image

The cleaning solution is relatively simple, just call erode with some suitable kernel

kernel = np.ones((5,5), np.uint8)
img2 = img.copy()
img2[mask] = (255, 255, 255)
img2[not_mask] = (0, 0, 0)
img2 = cv2.erode(img2, kernel, iterations=1)

I just do not want to call this on every image but only on some potentially interesting. So filtering is postponed to the last moment when I have to decide which type of artifact to report.

The red dots were so invisible (even with artifical color) that I had to list them in text file and try black and white image for reference. Now as I think about it the counting could be simplified and all old bounding boxes etc can be also postponed to the last moment! . I should use "slow" blob detection on the final image and then it should be easy, right?

BTW another „detail” difference between qualification Gazebo and competition Ignition: the artifacts used to be much brighter (red > 100) but now it would not work for threshold higher then 40 …

Today I also noticed twice that my robot X2R did not leave the start position (0, 0, 0). Once it was due to typo in configuration (see #217 X2_SENSOR_CONFIG4 vs. X2_SENSOR_CONFIG_4) but I am not sure about the second attempt … maybe I was just not patient enough, because the right robot is supposed to start 10 seconds later.

p.s. I made very wrong assumption that the red noise is negligible to the object itself. It is not so the original implementation looking for the local maxima as the best view was misleading. Now I use filter for all images before counting red pixels. Moreover there was a bug, that I re-analyzed not the best image but the image after the end of locality which was typically empty! It was not enough — there were still some larger particles, so it was necessary to select the largest blob only … fixed, and should be released as ver15.

p.s.2 !!! F I N A L L Y !!!

Finaly 3 points, simple tunnel 02, ver15

Note, that this simulation was also much faster — I added call of /subt/finish at the end and the run is reduced to 100 seconds of exploration:

{
  "sim_time_duration_avg": 246,
  "sim_time_duration_std_dev": 0,
  "real_time_duration_avg": 373,
  "real_time_duration_std_dev": 0,
  "model_count_avg": 2,
  "model_count_std_dev": 0
}

3rd October 2019 — Ver16=another ASCII Art, Ver17=Comeback robot/refactoring (6 days)

This morning I celebrated 3 points with a bit of fun. As there is no reasonable way how to get binary data from the robot simulated on AWS to the developer and the last night I was fighting with stupid bugs in artifact detection, I decided to dump binary sub-images (masks) as another ASCII art. Yes, going back to 80's — would you recognize this artifact?

306.388000000 INFO  Python3: stdout …………………………..XXX……..
306.388000000 INFO  Python3: stdout …………………XXXXXXXXXXXXXXXXXXXXX.
306.388000000 INFO  Python3: stdout …………….XXXXXXXXXXXXXXXXXXXXXXXXXXX
306.388000000 INFO  Python3: stdout …………….XXXXXXXXXXXXXXXXXXXXXXXXXXX
306.388000000 INFO  Python3: stdout …………XXX.XXXXXXXXXXXXXXXXXXXXXXXXXXX
306.388000000 INFO  Python3: stdout …..XXXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXX
306.388000000 INFO  Python3: stdout …XXXXXXXXXXXXXXXXXXXXXXX……..XXXXXXXXX
306.388000000 INFO  Python3: stdout …XXXXXXXXXXXXXXXXXXXX…………XXXXXXXX
306.388000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXX………XXXXXXXX
306.388000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXX………XXXXXXXX
306.392000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXXXXXXX..XXXXXXXXXX
306.392000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
306.392000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
306.392000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
306.392000000 INFO  Python3: stdout …XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
306.392000000 INFO  Python3: stdout …XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
306.392000000 INFO  Python3: stdout ….XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.
306.392000000 INFO  Python3: stdout …..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.
306.392000000 INFO  Python3: stdout …..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX..
306.396000000 INFO  Python3: stdout ………XXX.XXXXXXXXXXXXXXXXXXXXXXXXXX….
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXXXXXX……
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXXXX……..
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXXX………
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXXX………
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXX……….
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXX……….
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXX……….
306.396000000 INFO  Python3: stdout …………..XXXXXXXXXXXXXXXXXXX……….
306.396000000 INFO  Python3: stdout ………….XXXXXXXXXXXXXXXXXXXX……….
306.396000000 INFO  Python3: stdout ………….XXXXXXXXXXXXXXXXXX…………
306.396000000 INFO  Python3: stdout ………….XXXXXXXXXXXXXXXX…………..
306.396000000 INFO  Python3: stdout …………XXXXXXXXXXXXXXX…………….
306.396000000 INFO  Python3: stdout …………XXXXXXXXXXXXXXX…………….
306.396000000 INFO  Python3: stdout …………XXXXXXXXXXXXXXX…………….
306.400000000 INFO  Python3: stdout …………XXXXXXXXXXXXXXX…………….
306.400000000 INFO  Python3: stdout ………..XXXXXXXXXXXXXXXX…………….
306.400000000 INFO  Python3: stdout ………..XXXXXXXXXXXXXXXX…………….
306.400000000 INFO  Python3: stdout ………..XXXXXXXXXXXXXXXX…………….
306.400000000 INFO  Python3: stdout ………..XXXXXXXXXXXXXXXX…………….
306.400000000 INFO  Python3: stdout ………..XXXXXXXXXXXXXXX……………..
306.400000000 INFO  Python3: stdout ………..XXXXXXXXXXXXXXX……………..
306.400000000 INFO  Python3: stdout ………..XXXXXXXXXXXXXXX……………..
306.400000000 INFO  Python3: stdout ……….XXXXXXXXXXXXXXX………………
306.400000000 INFO  Python3: stdout ……….XXXXXXXXXXXXXXX………………
306.400000000 INFO  Python3: stdout ……….XXXXXXXXXXXXXX……………….
306.400000000 INFO  Python3: stdout ……….XXXXXXXXXXXXXX……………….
306.400000000 INFO  Python3: stdout ……….XXXXXXXXXXXXXXX………………
306.400000000 INFO  Python3: stdout ……….XXXXXXXXXXXXXXX………………
306.400000000 INFO  Python3: stdout ………XXXXXXXXXXXXXXXX………………
306.400000000 INFO  Python3: stdout ……..XXXXXXXXXXXXXXXXXXXXXXX…………
306.400000000 INFO  Python3: stdout …….XXXXXXXXXXXXXXXXXXXXXXXXX………..
306.400000000 INFO  Python3: stdout ….XXXXXXXXXXXXXXXXXXXXXXXXXXXXX……….
306.400000000 INFO  Python3: stdout …XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX……
306.400000000 INFO  Python3: stdout …XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX…..
306.400000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX….
306.400000000 INFO  Python3: stdout ..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX….
306.400000000 INFO  Python3: stdout XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX…
306.400000000 INFO  Python3: stdout XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX…
306.400000000 INFO  Python3: stdout XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX…
306.400000000 INFO  Python3: stdout XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX…
306.400000000 INFO  Python3: stdout XXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXX…
306.400000000 INFO  Python3: stdout XXXXXXXXXXXX….XXXXXXXXXXXXXXXXXXXXXXXX…
306.400000000 INFO  Python3: stdout XXXXXXXXX……..XXXXXXXXX.XXXXXXXXXX……
306.400000000 INFO  Python3: stdout .XXXX………………………………..
306.400000000 INFO  Python3: stdout .XXX…………………………………
306.400000000 INFO  Python3: stdout 1558 43 65 1237 1280 0.6615384615384615 0.3570661896243292
306.400000000 INFO  Python3: stdout h/w 1.5116279069767442
306.400000000 INFO  Python3: stdout 0:05:01.869666 Relative position: 2175 -3487 4637 TYPE_DRILL

There is a stupid "…" web replacement so you do not see what I see, but I do not have enough energy to fix it now :-(.

That was Ver16.

The plan for Ver17 is to allow robot multiple entries — up to now we used maximal time for exploration and returned just in time to report all artifacts at the Base Station. Now we have 1 hour for exploration and there is quite high probability, that the robot fails somewhere in the deep dungeons. The plan is to return more often with discovered artifacts and re-enter the tunnel. But … it requires some refactoring, as the Python3 and old C++ ROS code used to communicate via file containing list of detected artifacts:

developer@gpu-desktop:~/subt_solution/osgar$ cat call_base.txt
TYPE_DRILL 19.32 1.52 0.06
TYPE_EXTINGUISHER 24.36 2.74 0.06
TYPE_DRILL 24.51 1.38 0.06

As soon as the C++ ROS code discovered this file, parsed it, report it and the program terminated.

Also the query for position has to be re-triggered … hmm, there is probably no other way around it.

4th October 2019 — Ver18=Comeback robot again (5 days)

No fun today. Ver17 failed, twice. The first run X2L robot scored 1 point and detected only drill. X2R recognized and localized both artifacts but did not report the result at all — 0 points. Sigh. Re-run failed even to download the log files from AWS.

Now I am practicing back-and-forth motion, the scenario is „encoded” in the robot name. The first test was X0_10_20L. Wait 0 seconds then explore tunnel for 10 seconds, return to the entrance and try for the second time 20 seconds. Worked after many iterations. The origin is specified as PoseStamped described as position + orientation as quaternion. Now is running X0_40_100_300L which should explore Simple Tunnel 02 3 times and always add one artifact. If it passed I may try X0_40_100_300_40_100_300_40_100_300_40_100_300_40_100_300_40_100_300L.

Run X0_40_100_300L

p.s. I completely forgot to mention why Comeback robot … it was based on movie Vratné láhve and návratový typ …

p.s.2 So maybe there will be no X0_40_100_300L tonight …

Oh f*, sigh

p.s.3 released ver19 and code X0F40F100F300L

24 letters limit

… well, double check the assumptions.

5th October 2019 — Ver20 = Too many origins (4 days)

Yesterday I made a mistake. Or two. The first one is related to the /subt/pose_from_artifact — it is OK to query it when the robot is initialized for the first time, so I know global offset between robot initial local position (0, 0, 0) and the global (0, 0, 0) at the tunnel entrance. But it is not OK when the robot returns and updates its origin, sigh. I still keep robot positions in its local coordinates starting from (0, 0, 0) but the offset is replaced by this new origin. So it would be better to keep original origin.

I wanted to collect some more data, so I reduced the log files already during recording (before I was removing streams receiver.raw and rosmsg.image, which took like 98%) … but the behavior changed … the SIMULATION behavior changed! The robot collided at the tunnel entrance as it tried to stop from moving at 1m/s:

collision at entrance

It was possible to repeat it 2/2, the accelerometer X value was -18.055m/s2 and the threshold for collision was 18. So I changed the limit to 20.

The second mistake I probably made when I was uploading ver18 and later ver19 replacing "_" by "F" the last night …

Error: InitializationFailed

But it is not easy to find out what went wrong … my guess is that I filled in the form docker image as copy and paste of docker push 200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika:ver19 instead of 200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika:ver19 …

… and here are new debug outputs with multiple returns in Simple Tunnel 02:

logfile: zmq-subt-x2-191005_172226.log (27MB)
video: Virtual SubT Robot X0F40F100F400L

6th October 2019 — Ver21 = base64 stdout logging (3 days)

I could not sleep well this morning so I got up and started coding. What I had on my mind was dump of log as base64 to ROS stdout. This would be crazy before, but it became more reasonable with reduced logs. I wrote a simple OSGAR driver stdin, which reads text lines, optionally adds line number and sends it to the system. Then I plugged rosmsg and zeromq and it should stream standard input all the way to ROS. Yes, I had to tweak record.py that it will not record infinitely but only until last module finishes … ready to test BUT:

martin@temp:~$ ssh -p 2200 gpu@localhost
ssh: connect to host localhost port 2200: Connection refused
martin@temp:~$

The remote machine is not available! F* … it looks like I will rest the rest of Sunday! Last night I did test with simulation of four X2 robots, just to see how it behaves, but there was enough space and it should be already terminated … hmm, … bad, bad bad …

Here you can see at least some pictures of other X2 robots (they were recorded as red artefact, i.e. another error to be fixed):

robots seen on lidar

zmq-subt-x2-191005_194907.log (40MB)

p.s. it looks like the bridge docker has some memory leak?? That was the only docker I am aware I left running last night … and the machine was heavily swapping now … hard shutdown.

7th October 2019 — Ver22 = Hlava 22, hack log dump (2 days)

Last night we failed, together with PavelJ, to get version with ZeroMQ to re-enter originally opened pipeline and dump log file. Sigh. On the other hand it worked if we were still running the basic navigation code … and it is surely some stupid bug, as usual.

So it was time for even more desperate solution:

def dumplog(self):
    import os
    filename = self.bus.logger.filename  # deep hack
    self.stdout("Dump Log:", filename)
    size = statinfo = os.stat(filename).st_size
    self.stdout("Size:", )
    with open(filename, 'rb') as f:
        for i in range(0, size, 100):
            self.stdout(i, f.read(100))
    self.stdout("Dump END")

… there was ""base64" for short moment, but then print of binary string works, so maybe … later. I tried X0F40L and it dumped something! Note that it is dumping OSGAR log file while it is still writing this output to the same logfile — that is the motivation for file size at the beginning. Also I am aware that the last 100bytes will not be complete … but I expect holes inside so few bytes at the „end” do not matter much. And now I have historically the very first log file from AWS!

Yeah, just to make sure I am able to decode it, right? But in mean time I triggered 5 simulations with robots:

A0F40F100F400L
B10F40F100F400R
C1200F40F100F400L
D1220F40F100F400R

on docker 200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika:ver22. Beside logging it provides fail-safe as in System Trac: if the pitch angle is bigger then 20 degrees, try to return home a little and then continue with original side of the wall.

There was another joke — I was able to trigger simulation for 5 robots! The credit system allows it (there are 1000 credits and robot X2 in configuration 4 with lidar and HD camera „costs” only 180 credits):

{
  "sim_time_duration_avg": 2532,
  "sim_time_duration_std_dev": 0,
  "real_time_duration_avg": 8057,
  "real_time_duration_std_dev": 0,
  "model_count_avg": 5,
  "model_count_std_dev": 0
}

Now I am anxious to see the first logfile recovered … so update probably later today … here are some local simulations:

zmq-subt-x2-191006_213808.log (27MB) — practice tunnel 01, sees extinguisher and backpack but fails on 2nd crossing
zmq-subt-x2-191006_190653.log (18MB) — practice tunnel 03, old code, navigates through narrow passage and fails at 3rd crossing
zmq-subt-x2-191006_211831.log (12MB) — practice tunnel 03, pitch limit 20deg, fails to enter tunnel on the first crossing

Do you see the artefact? Robot does …

hidden artefact in practice tunnel 01

p.s. the first observation:

0 356.072000000 INFO /subt_solution [/home/developer/subt_solution/src/subt_seed
/src/subt_seed_node.cc:393(getSpeedCmd)] [topics: ] Python3: stdout Size:

… somebody forgot to print the value …

p.s.2 the log AWS/ROS file is not complete … it was pushed to OSGAR but not all the way to ROS … i.e. wait for completion is necessary. Downloaded 3672700 bytes. But still, not too bad!

The first AWS OSGAR log output!

… so 48 hours to the end of competition I can start to work as I am used to … great! The hacked conversion utility is here: aws2log.py.

time to give up

aws-zmq-subt-x2-191007_043507.log

8th October 2019 — No Silver Bullet, Ver23 = origin ERROR, rosout.log.2 (1 day)

There is No Silver Bullet which could lead now into major software progress „jump”. Ver23 contains two bugfixes and one experiment:

if request for robot origin fails, do not repeat query but send to Python3 origin ERROR message and pretend that you are already at the tunnel entrance
if the article is in -20 < ax < 0 and -10 < ay < 10 do not report it. It is probably another X2 red robot
the wall distance was changed from 0.9 meters to 0.5 meters so the robot at least tries to enter narrow stone passage in Practice Tunnel 02 where Kevin with drones already scored 2 points

And that is it.

Actually no, there are some other news, but no other code changes from yesterday. The first note is regarding logged output. I was too excited yesterday morning, but it did not last long (1 hour?) — the next rosout.log was cut! I was very upset about it, extended issue #204, even wrote new code to be able to recover at least the last pieces of OSGAR logfile … to find out later on, that it is indexed!, i.e. beside active rosout.log there is also rosout.log.1 and sometimes rosout.log.2, each 100MB long.

Another issue is with not properly starting instances or loooong time of their removal (one simulation took 8 hours, where at least 2 hour were DeletingPods With no errors (#235).

And finally old competitors are slowly showing up in the Leader Board tables. There are 7 in Simple Tunnel 02 (including sodium-24-robotics) and CYNET-ai is working on submission (#236).

One news from this morning … I see new report: TerminatingInstances Error: AdminReview so maybe they finally accepted that I really use the same docker just many times and it is strange, that it sometimes starts and sometimes does not. More over it is reduced to single ROS C++ main code (single threaded) based on their subt_seed example … sigh.

9th October 2019 — The Gods Themselves (0 days)

Isaac Asimov wrote nice novel The Gods Themselves which was translated also into Czech Ani sami bohové. I had on my mind the Czech translation which would be re-translated back to English „Not Even The Gods Themselves” … you may not know about what I am talking but you may guess

The simulation environment was upgraded yesterday. Yes, no kidding — see the release notes. The team Sophisticated Engineering just give up and deployed their solution while the machines were down (see 241). I submitted a small test (short, two robots in Simple Tunnel 02 … which is not short, because it also tries to submit the OSGAR logs), it failed, was reviewed by admins and restarted … it is not finished yet. Now we have the next morning.

Day 0 --- status at 6am

Sigh.

What I am testing is ver2505 and ver2509 — simply ver25 with smaller and larger parameter walldist. These would be my notes for launch … in the unlikely (?) case I will throw my notebook out of window, so somebody from the team (PavelJ?) can take over … BTW another joke: PavelJ wanted to check the images locally and that was the last thing he could do last night … maybe he is still downloading now.

A10F2400L
B60F1500R

200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika:ver2505

C120F100F400F1000L
D400F100F400F1000R

200670743174.dkr.ecr.us-east-1.amazonaws.com/subt/robotika:ver2509

Final(?) strategy

… as SusanB would say 20+ years ago Everything is going to be alright, because Life is Good (R) …

Yes, it is. The Virtual Track/Tunnel Circuit ends today.

p.s. I liked this screenshot:

Watching trace of leaving robots

It was for the same robot name codes in Practice Tunnel 01, when only the robot D was able to dump OSGAR log into rosout.log file. BTW robot C did not start at all (see #225).

10th October 2019 — The 5 stages DABDA or DARPA?

DABDA. … Denial; Anger; Bargaining; Depression; Acceptance.

https://psychcentral.com/lib/the-5-stages-of-loss-and-grief/

#244, #246, #219, #235, #241, … #248.

It is not easy even to die … Error #5507: Circuit simulation submission limit reached

final submit ... it failed anyway ...

15th October 2019 — 5 days after

Status 2019-10-15

21st October 2019 — 11 days after

Status 2019-10-21, RunningWithErrors, With no errors

23rd October 2019 — Robotika Virtual Presentation

It is two weeks now after our final solution docker submission and one week to the final DARPA announcement. Few days ago we received following mail:

Virtual Tunnel Circuit Awards Ceremony Information for October 30th Virtual Competitors, Please join us for the DARPA SubT Challenge Virtual Tunnel Circuit Awards Ceremony & Technical Interchange Meeting on Wednesday, October 30 at 11:00 AM EDT. The program will be broadcast via DARPA’s YouTube channel: DARPAtv and will last about an hour. For those who are unable to view the live stream in real time, the broadcast will be recorded and posted to DARPAtv at the conclusion of the event. As part of the program, we request that each virtual team uses the attached template to create a 5-minute presentation with recorded narration for DARPA to broadcast. We look forward to celebrating our virtual competitors!

Well, so here is our Robotika Virtual Presentation. It was created by slightly poissonned mind … somehow the two screenshots I posted 5 and 11 days after the final submission are still relevant. They are interlaced every once a while.

If you think that my spoken English is much better now then you are wrong. In order to reduce emotions I used robots to do the job and MichaelV3 from IBM Watson was the best. Useful overview link was how to use text2speech in Python.

I guess there will no update until the final announcement, so we will see.

At the moment I am aware of 3 potential fatal bugs:

1) OSGAR is generating motion commands on 10Hz timer. They are published to ROS/Ignition while the CloudSim docker is still booting. I suppose that ROS/Ignition should drop the commands as soon as the queue is full, but nobody knows. This issue mentioned Arthur, but I do not know the conclusion yet.

2) ver25 uses laser scan differences to detect stuck robot and after 3 seconds it generates Collision. This should be ignored at the beginning during turn and straight motion towards tunnel entrance, but I am not 100% sure.

3) finally the expected tunnel slope was never tested in Ignition. During qualification the tunnels were up and down, so they could trigger now return home although the robot did not hit any stone. The released limit was 20 degrees as used in System Track.

So see you in a week on DARPAtv unless some unexpected party happens …

p.s. if you ever encounter problems with Power Point and audio, replace mp3 format by wav (see MS forum). Yes, the original 3MB presentation with voice will grow to 23MB … yeah, rather no comment.

30th October 2019 — Tunnel Circuit Awards Ceremony

Virtual Competitors,

I hope you are having a great week so far. We are getting ready for the Awards Ceremony tomorrow and wanted to make sure we had all of the updated information.

You can access the direct link below to watch the stream of the awards ceremony starting at 11:00 am EDT https://www.youtube.com/watch?v=JxwL2eGQQs8. For those who are unable to view the live stream in real time, the broadcast will be recorded and posted to DARPAtv at the conclusion of the event.

As a reminder at the end of the program Dr. Chung will respond to questions collected via the community forum. In order to submit questions you will need to create an account (if you haven’t already) and post the question under the “Questions” category and add the tag virtual-tunnelawards. A limited number of questions will be answered live; however, Q&A will be posted on the community forum after the meeting.

OK, so for us it means today (30th October 2019) at 4pm Central European Time. I posted 3 questions related to my complains: