United Kingdom
Asked — Edited

Flaky Serial Link With V4

DJ I hope you can help? I am experiencing major problems with the reliability of the serial link that concerns me a lot, as it stands the problem is bad enough to undermine all my work (over the last year) on utilizing the v4 for the ALTAIR robots.

The new ALTAIR heads have multiple PICs that all report back to a Master PIC which itself communicates with the v4. On testing I keep noticing the v4 drops out randomly and also breaks all serial comms at random times. In an attempt to discover the problem, I shut down all the interrupts buzzing around the PIC network, so that the Master was just (simply) communicating with the v4 with no external interrupts. I then ran some simple code that just sends 3 bytes to the v4 (every 100mS) then increments the bytes and so on. At the v4 end it just looks for the 3 bytes in the buffer reads them and increment a $times counter. What happens is that the v4 just breaks serial comms or has byte read errors at random times.

When the fault occurs serial comms completely stop and it is necessary to stop the script and start it again. I have found this happens when I put a static "3" in the UartAvailable line (for example if UartAvailable(0,0) = 3), but this should not be a problem as the v4 is only ever going to get an isolated packet of 3 bytes? To get around this if I change the static "3" to ">= 2" (for example if UartAvailable(0,0) >= 2) then the comms does not stop or error (so much), but of course this means that a packet misread has occurred and the buffer has accepted a 2 byte packet which means we have lost a byte.

From my tests, in general it looks like the bytes available error occurs once in roughly 100 packet sends and a byte error (data byte read incorrectly) occurs around once every 250 packet sends. Below is a screen dump from one test which shows the errors from just over 1600 packet sends.

User-inserted image

Unreliable serial is a major setback for me as all my robots use serial linked subsystem PIC modes so these dropouts where serial links can also just stop is a complete disaster for my robot designs!

Tony


ARC Pro

Upgrade to ARC Pro

Subscribe to ARC Pro, and your robot will become a canvas for your imagination, limited only by your creativity.

PRO
Synthiam
#1  

Share your code - is there a delay in the loop? Sounds like the data channel is flooded

PRO
United Kingdom
#2  

DJ, there is a 100/10mS delay in the loop, so their should be no channel flooding. I did another long test run and get 220 available errors in 16000 packet sends - this time there was no disconnection. I have just done another test and got 11 available errors in 536 packet sends then the v4 disconnected?

I hope it is me doing something wrong as I badly want this all to work! Here is my code its a bit messy as I have been trying a lot of things to make all this work. I can try some CRC and error checking routines in the master PIC, but I would really prefer for it to work efficiently without the need for these.

Tony


:MAIN_LOOP

# bytes in from master PIC
# only process data if the correct packet size is available
$available = (UartAvailable(0,0))
#if (UartAvailable(0,0)>= 2)
if ($available = 0)
  goto (MAIN_LOOP)
endif

if ($available != 3)
  $error++
  print ("available error")
  print ($available)
  print ($error)
endif

if ($available >= 2)
  # a valid packet is ready - read the packet
  UARTReadBinary(0,0,3,$inputdata)
  $byte1_in  = $inputdata[0]   
  $byte2_in  = $inputdata[1]
  $byte3_in  = $inputdata[2]  # 8 thermal pixel data
  $times++
  $new_data = 0

print($byte1_in +":"+ $byte2_in +":"+ $byte3_in)
    if ($last_byte1 != $byte1_in or $last_byte2 != $byte2_in or $last_byte3 != $byte3_in)
      # latest data packet is different to last
      $new_data = 1
      print("new data")
      $last_byte1 = $byte1_in
      $last_byte2 = $byte2_in
      $last_byte3 = $byte3_in
    endif

else
  $times2++
  sleep(100)
endif  
    
#print($byte1_in +":"+ $byte2_in +":"+ $byte3_in)
sleep (10)
GOTO(MAIN_LOOP)

PRO
United Kingdom
#3  

I did an extensive test with the serial connection to the v4 with the EZ:1 head PIC network switched back on (sending live data packets) - I ran it for a few hours and got 157 available errors in 18500 packet sends, the good news is there was no disconnection this time. Thanks DJ in advance for anything you can advise here.

With the EZ:1 head open, it feels like I am doing a bit of brain surgery on the poor bot!

User-inserted image

Tony

PRO
United Kingdom
#4  

Hi Jeremie, thanks for looking into this for me - to me its starting to look like its the old latency issue again, here is an earlier thread on this.

synthiam.com/Community/Questions/8067

It looks like that if you need a rock steady reading of say an external digital line it needs a 500mS delay and continuous reading of a serial input seems to need to a huge 1000mS (or higher) delay between packet reads to start being a reliable link. Less time delays will work but read errors will increase.

The following are delays added (between packet reads) into the master PICs main loop communicating with the v4

100mS delay in main loop gives 140 available errors in 500 sends 500mS delay in main loop gives 40 available errors in 500 sends 1000mS delay in main loop gives 2 available errors in 500 sends 2000ms delay in main loop gives 0 (zero) available errors in 500 sends

After longer tests even 2 second delays gets a small number of available errors. Most of the available errors are "2" or "6" from a 3 byte packet sent although I have seen "9" quite a few times as values that the v4 computes are available?

I also keep getting random disconnects which only seem to happen in the UART read mode?.

Some further info - the PIC to v4 baud rate is 19200, so the 3 byte packet takes under 2 milliseconds to transmit.

Fortunately for me the head PIC network does most of the work (over 90%) and only sends processed data to the v4, so I can try to live with these huge time delays.

For me though, this is a major flaw in using the v4, but I am going to try some CRC algorithms so I do not lose data - this of course will slow comms down even more.

My expertise is not in wifi, so I cannot be sure if this is a problem, but the wifi channel was not busy when these test were carried out, and I do not have any wifi issues outside of ARC.

Tony

#5  

I was wondering, what happens at lower baud rates? Usually when data corruption is the issue, the best solution is to go to a lower rate. Increasing time between readings to a second or more seems like going too far to get reliable communications. Baud rate shifting often happens automatically in the handshaking process between units that use serial communications. That and using better quality cabling between the units involved in the communication process. This is especially acute at lower voltages and equipment with inadequate shielding or setups with too much inherent capacitance between the transmission ends. Capacitance (or inductance) in the connecting wires can be a signal killer. Have you actually looked at the transmission signal itself to see what quality of pulses you are sending/receiving? The the better choice is shielded cable which often has a characteristic impedance (capacitance-inductance-resistance combination). Of course your interconnection wiring here is short, but it is not to be simply dismissed when dealing with higher baud rates. In these days of gigahertz device operation we often take high data rates for granted and overlook the potential problems, but they can still crop up, as may be the case here. Even at "only" 19200 baud.

Going with a lower baud rate would certainly get much more data reliably across the channel, even at very low baud rates, than waiting upwards of seconds between data transmissions. If the PIC is unable to go lower, I would suggest a different interface device or a custom unit that can go to lower rates.

PRO
United Kingdom
#6  

@WBS00001, I tried lower baud rates and it makes no difference - in fact 9600 seemed to get more errors which does not make any sense. I scoped out the transmission and it looks clean and sharp. The main problem is for the PIC to handshake with the v4 - I tried a hardware RTS (request to send) line from the PIC to the v4 to initiate a (synchronised) hand shake but because of the v4 port latency problems (see my other thread) the strobe pulse has to be over 500mS long before it is reliable accepted, DJ confirms this problem in the other thread.

What I am now doing is sending a dummy byte from the v4 and making the PIC wait for this before sending any data - its a crude hand shake and is clunky and not very efficient, but I have been running the test code for a couple of hours with no errors at all. This is OK as I can make the PIC sync with the v4, but a device like the B5T Omron sensor will not have this facility so some of the continuous data sent from it will get lost if it is directly connected to the v4 UART.

Tony

#7  

Hey Tony,

Very interesting problem, one which I'm sure others will be interested in and might like to help with. When I examine the issue I break it down this:

  1. Custom PIC board sends serial data out
  2. Data is send to the physical EZb4 via the UART port
  3. EZB4 sends data wirelessly to via Wi-Fi to network or PC
  4. PC receives receive Wi-Fi or network traffic from EZb4
  5. ARC software running on the PC receives the data
  6. Scripts in ARC interprets and display data sent from Custom PIC.

In my mind, there are 5 points of failure for the communication drops. I have a few guesses as to what might cause it (we probably all do) but maybe some different troubleshooting and experimenting would narrow down the scope of the issue.

The biggest question might be is this a problems an issue with the serial protocol on the PIC side or the EZb side? My guess is you have a higher level of knowledge in this area than I do, but over the years working with serial device professional and for personal us I've witnessed a wide verity of issues related to latency, timing, baud and handshake protocol and while it seems like serial is serial, but that has not been my experience over the years.

I'm curious if you have the ability to do some other experimenting?

Experiment 1: If you were to take a popular known microcontroller like an arduino and connect it to the UART and run similar PING style tests to see if packets are also lost?

Experiment 2: Connect to the PIC serial to the PC and run a similar test and see of packets are also lost?

PRO
United Kingdom
#8  

Hi Justin, firstly thanks for your and @WBS00001 input on this, it is appreciated.

There is no trouble with the PIC comms side - I have been writing PIC code since 1995 so I am very proficient in PIC serial comms. In fact the ALTAIR head has 3 networked PICs all working in unison (using quite complex low/high priority interrupts) and it all works fine. The master_PIC connects to the v4 and this is where it all goes wrong - I have checked the master_PIC to v4 serial link on a custom PIC terminal that I use for testing and that is working as expected. I have never used an Arduino in my life, I just like coding PICs which I am now pretty efficient at!

The thing with the PING/Arduino style tests is that you will probably never notice the missing data/packets as most will get through - it is only that I am counting/analyzing every data transaction that I am seeing the drop outs and errors. Of course with something like the B5T sensor, missing packets can throw the whole thing out.

Using the dummy send byte method from the v4 (to form a crude hand shake) with the master_PIC is working - the test has been sending live data from the master PIC and head network for 4 hours now with no errors. This is not the best option, but at least the master_PIC now has a reliable comms connection with the v4.

Tony

PRO
United Kingdom
#9  

I am now continually getting "nagged" to post that my forum thread "Flaky serial link with v4" has been resolved when it has not. Some things are probably not resolvable and the fact that DJ and Jeramie have never come back with a solution maybe confirms that the serial link can be flakey?

I think its pretty poor to have to pretend that someone has resolved my help request just to stop (what in my opinion are) these unnecessary emails, this will be the last time I will post for assistance because of this.

EZ-Robot, can I suggest that you allow for the situation that a help request cannot be resolved, so it stops these continuously annoying emails being sent in these cases. Thanks

Tony

PRO
USA
#10  

I hate that nag and stopped using the assistance option because of it. I'm in line with Tony on this absolutely unnecessary and annoying. Seems to work on days unresolved. Please remove this nag EZRobot.

Sorry now back to Tony's thread.....

#11  

@Toymaker Yeah, it is annoying. I started getting them even no one responded to one of my posts at all. Just my original post in the thread, nothing else. Steve G was kind enough to "bump" the thread to try to get some response. But the fact remains it is still unresolved, and I will continue to get notices of that fact what seems like every other day. Obviously these notices are automated, but maybe there needs to be humans looking at the situations to keep them from going out unnecessarily. Especially considering that you can be banned for not marking the thread resolved within some amorphous time period. In some cases it just takes time to get an issue resolved. Perhaps a lot of time. Getting unending notices so soon and so often does not help the situation. Worse, it can lead to simply giving up on the product altogether.

PRO
Canada
#12  

While I can't do anything about the email notifications, that's @DJ's department:) I can mention that I haven't had a chance to look at the serial link yet but I should have time in the next little bit.

Tony quick question for you, are you using internal RC oscillators with your PICs or external crystals?

PRO
United Kingdom
#13  

Thanks Will and WBS for your comments, I am relieved that it is not just me that gets wound up on this!

I now get a "nag" email every day, with the usual mandatory threats.

I am quite offended by the threat that we can be banned if we do not accept that the thread has been resolved within a certain time period, when obviously on some occasions it does not! To EZ-Robot I am not going to pretend this thread has been resolved when it has not - so to all my forum friends if I suddenly disappear from the group you know why!

I will not conform to this treatment, (and being treated like some sort of errant child) so I may well get banned!

DJ and EZ-Robot, you respectfully need to seriously look at this urgently else you will lose valuable users and future customers.

My recommendation to forum members is to never use "assistance required", I am sure help will still be given from this great community.

Tony

PRO
United Kingdom
#14  

Jeremie, I am using internal oscillators but am 100% sure this is not the problem - with the latest generation of PICs, I have never seen timing issues using RC. Also the head PIC network are all serially linked at the same baud rate and there are no issues there. The problem has to be at the v4 buffer end, what I have done now is to get the master_PIC to wait for the v4 (to flag) before any data transactions are attempted, obviously this is not optimum if very important data needs to be sent to the v4 as there can be appreciable delays before it receives the data, but this is the only way I can get the data packets to transfer reliably. Ideally it would be good to be able send urgent data over to the v4 instantly without have to wait for another v4 instigated cycle which is obviously a delay.

More info on this, the master_PIC serial link to v4 has a "high priority" interrupt, so when the v4 sends a dummy (or command) byte the master_PIC stops whatever it is doing (jumps to the interrupt) and sends the latest data packet (containing sensory data etc from the networked PICs) back to the v4. The v4 has to instigate the transaction, not having interrupts on the v4 is one reason for these limitations.

Also the random disconnections are a real nuisance, the longest I have got is a run of 4 hours, but mostly there is a disconnection every 2 hours or so.

Tony

Ireland
#15  

Tony, like many others I wish I could jump in and resolve the serial communication issue, but personally I have little knowledge in this field. May I make a suggestion.

Use two Ezb's connected by serial ( no other devices connected ) and see can you replicate the fault,if that is possible it may identify where the issue lies.Then you will find a way with you skills to resolve it.

Pat

PRO
Synthiam
#16  

This is a logic oversight and not a bug with the ez-b or flaky serial. The IF condition which checks for the available bytes not equaling 3 is the doing the job correctly. The reason for the behavior which you have identified is that at the exact moment the UartAvailble() returns, there is indeed 0, 1 or 2 bytes in the buffer and not 3 as you were hoping. This is because the transmission from the other serial device is transmitting at a baud rate which takes time.

The logical challenge is a result of what you're not used to with micros. The EZ-B v4 isn't a micro, it's a program on a micro that does a bunch of stuff and communicates over WiFi. So, Tony's past experience as a micro assembler programmer is to check the UART Receive bit on the micro to instantly (within a microsecond loop) see if there is data to be read. However, on the EZ-B v4 you can only check to see if data is available at a much much slower interval (milliseconds vs microseconds). This dramatically increases the chance of checking for data while the data is still being transmitted from the serial master device.

You will always run into a situation where the number of expected bytes has not been transmitted, yet. And that's expected behavior for this design - delay, loop and the next time the data will be available.

My advice is...

  1. ignore the error condition that you had created because it will occur when ever the condition executes while data is in the middle of being transmitted.

  2. treat the condition with the same logic as my original example that i had provided to you when the question was first asked. Reference this thread: https://synthiam.com/Community/Questions/8067

You will notice that in my original response to you, I had written a loop which checks for the availability to be EQUAL or GREATER than the byte packet size. From that, i pull the data in increments of the packet. Here is an example and merging your most recent code...


# This is a demo on how to wait for a specific UART packet size
# and process the data when it has been received

# Initialize the UART (this also resets the ez-b's input buffer)
UARTInit( 0, 0, 9600 )

:loop

# Only process data if the packet size is available (packet size is 3 bytes)
# If the expected data size is not available, assume we're in the
# middle of transmission, delay and loop.
IF (UartAvailable(0, 0) >= 3)

  # A valid packet is ready. Read the packet
  UARTReadBinary( 0, 0, 3, $inputData )

  $byte1_in  = $inputdata[0]
  $byte2_in  = $inputdata[1]
  $byte3_in  = $inputdata[2]  # 8 thermal pixel data
  $times++
  $new_data = 0

  print($byte1_in +":"+ $byte2_in +":"+ $byte3_in)

  IF ($last_byte1 != $byte1_in or $last_byte2 != $byte2_in or $last_byte3 != $byte3_in)
    # latest data packet is different to last
    $new_data = 1
    print("new data")
    $last_byte1 = $byte1_in
    $last_byte2 = $byte2_in
    $last_byte3 = $byte3_in
  ENDIF

ELSE

  # no valid packet is available
  # pause before we check again
  # to be friendly on the communication channel

  sleep(100)

ENDIF

goto(loop)

To provide additional information on how the EZ-B v4 serial works may help set your mind at ease. The STM32 ARM micro for the EZ-B v4 has DMA for the UART receive and transmit. I use only the receive DMA and it is enabled when the UartInit() method is called which configures the DMA and UART parameters. The DMA has a large array in memory which it stores the incoming bytes automatically. This is a fantastic hardware configuration and requires only configuration code and no "program" code. The data is stored automatically in a buffer and the DMA knows its position in the array by a CPU register.

When ARC calls UartAvailable(), the EZ-B v4 returns the data in the register which is the number of bytes currently populated.

When you read the Uart, the register is not reset. As you would expect, the number of bytes you read are subtracted from the register. This means it maintains the bytes which you haven't read. The only way to fully clear the register and reset it to 0 is to read all available byte data.

The EZ-B v4 micro has a hardware configuration, not a software program that handles the UART. In my very extensive tests, the hardware of the STM32 micro has not had any issues.

#17  

Hey Tony,

About the "nag" email every day, I too have got those and found them annoying but my take on it was a little different, from my view it was likely implemented to encourage folks to check to see if they have a resolution, but it comes across as a little naggy. And I never viewed the threat to be banned as serious or even directed to me as I view myself as a good community member (well, at least borderline "OK", LOL), and viewed it as directed to those who would abuse the message boards.

But given yours and others feed back my hope is EZ-Robot will implement some verbiage changes to the notification and some logical rule changes and some more options to denote the state of a help request.

I'm glad you brought the topic up Tony, but my hope is that you don't leave the forum over it or take the notifications as personal. I think EZ-Robot staff have been good about making changes that user bring to their attention, I'm sure this will be improved as well.

PRO
Canada
#18  

Thanks Justin!

I was going to mention the same thing, the automated responses aren't directed at the people who are regularly engaged in the forum but rather those that come and go and sometimes forget to close their thread.

Tony,

I did some tests last night with an Arduino (which has an external crystal) with your code on the ARC side and found much the same results as you did. I figured I'd try to turn over a few stones to see if anything like an internal RC oscillator might be related but, nope, it wasn't. I will have to default to @DJ's response on this one as he has much more knowledgeable than I in this department.

I guess I should ask the question that hasn't been asked yet, what is your goal? Maybe we can find another route to achieve what you are going for. Are you looking for the fastest way to give real-time data from the sensors directly to the GUI? Could you possibly for-go the master-PIC and use UART0, 1, 2 from the ez-b directly to the slave PICs?

PRO
United Kingdom
#19  

Hi Dj (and also Jeremie), thanks for the detailed response, it is appreciated, I had guessed this was the way that the v4 worked from my observations from the last week or so.

DJ, I did use your code extract and it helped a lot but did not fully stop the errors - it is the 100mS delay as you suggest "to be friendly to the communications channel" before looking again that is the killer, this is a huge amount of lost time for the tech that we use today.

For the record and to hopefully help others, here is my working solution that does not need such a long delay as gives a reliable data transfer:-

  1. Master_PIC does its own work/processing which can be instantly paused by a high level interrupt from the v4 (on the serial line).

  2. The v4 instigates a data transfer either by sending a dummy byte or a command packet, this causes the master_PIC to jump to the HP (high priority) interrupt routine - the master_PIC now reads in the byte or (genuine) command packet.

  3. I found as DJ mentioned in his explanation, that the PIC response is too fast to straight away send back a data packet to the v4, so the PIC needs to wait around 10mS (in its Interrupt handler routine) and then send the packet to the v4 - it seems that any delay < 7mS brings back the errors again as DJ explained.

Using the above method, I just finished a test run of over 40,000 data transactions between the v4 and the master_PIC and only logged one error which I can live with.

Again thanks for your input.

Tony

PRO
Synthiam
#20  

Tony, what baud rate are you using? Consider the transfer time of your baud rate when pushing for a 10ms interval on the ez-b v4.

Also note that the word errors is incorrect. "Expecting" the packet to have been transferred before transmission has ended is not an error. Again, this is a logic challenge and once it's understood that the ez-b v4 and the other serial device have no way of synchronizing...

The only option to synchronize is to do this - and ensure you're using a very high baud rate of maybe 115k or higher if possible.


# This is a demo on how to wait for a specific UART packet size
# and process the data when it has been received

# Initialize the UART (this also resets the ez-b's input buffer)
UARTInit( 0, 0, 9600 )

:loop

# Only process data if the packet size is available (packet size is 3 bytes)
# If the expected data size is not available, assume we're in the
# middle of transmission, delay and loop.
IF (UartAvailable(0, 0) &gt;= 3)

  # A valid packet is ready. Read the packet
  UARTReadBinary( 0, 0, 3, $inputData )

  $byte1_in  = $inputdata[0]
  $byte2_in  = $inputdata[1]
  $byte3_in  = $inputdata[2]  # 8 thermal pixel data
  $times++
  $new_data = 0

  print($byte1_in +&quot;:&quot;+ $byte2_in +&quot;:&quot;+ $byte3_in)

  IF ($last_byte1 != $byte1_in or $last_byte2 != $byte2_in or $last_byte3 != $byte3_in)
    # latest data packet is different to last
    $new_data = 1
    print(&quot;new data&quot;)
    $last_byte1 = $byte1_in
    $last_byte2 = $byte2_in
    $last_byte3 = $byte3_in
  ENDIF

ELSE

  # no valid packet is available
  # send a synchro packet which instructs the other device to
  # transmit the packet
  # pause so we can wait for the communication delay

  UartWrite(0, 0, &quot;A&quot;)

  sleep(20)

ENDIF

goto(loop)

PRO
United Kingdom
#21  

Hi DJ, Thanks for the further script - as I said in my post #20 I have already found a method for reliable data transfer to the v4 using a similar synchronize to your suggestion (sort of handshake) that I explained in the post. it is all working ok now.

The baud rate I am using is 19200, so the 3 byte packet takes under 2 milliseconds to transmit.

You are correct "expecting" is a better term for the event, I now understand how the v4 operates thanks to your guidance.

Tony

PRO
Synthiam
#22  

Awesome Tony - glad it's working out! With the next update of ARC, you can create controls specific for your robot, with branding, etc...

PRO
USA
#23  

Looking forward to that!