DJ I hope you can help? I am experiencing major problems with the reliability of the serial link that concerns me a lot, as it stands the problem is bad enough to undermine all my work (over the last year) on utilizing the v4 for the ALTAIR robots.
The new ALTAIR heads have multiple PICs that all report back to a Master PIC which itself communicates with the v4. On testing I keep noticing the v4 drops out randomly and also breaks all serial comms at random times. In an attempt to discover the problem, I shut down all the interrupts buzzing around the PIC network, so that the Master was just (simply) communicating with the v4 with no external interrupts. I then ran some simple code that just sends 3 bytes to the v4 (every 100mS) then increments the bytes and so on. At the v4 end it just looks for the 3 bytes in the buffer reads them and increment a $times counter. What happens is that the v4 just breaks serial comms or has byte read errors at random times.
When the fault occurs serial comms completely stop and it is necessary to stop the script and start it again. I have found this happens when I put a static "3" in the UartAvailable line (for example if UartAvailable(0,0) = 3), but this should not be a problem as the v4 is only ever going to get an isolated packet of 3 bytes? To get around this if I change the static "3" to ">= 2" (for example if UartAvailable(0,0) >= 2) then the comms does not stop or error (so much), but of course this means that a packet misread has occurred and the buffer has accepted a 2 byte packet which means we have lost a byte.
From my tests, in general it looks like the bytes available error occurs once in roughly 100 packet sends and a byte error (data byte read incorrectly) occurs around once every 250 packet sends. Below is a screen dump from one test which shows the errors from just over 1600 packet sends.
Unreliable serial is a major setback for me as all my robots use serial linked subsystem PIC modes so these dropouts where serial links can also just stop is a complete disaster for my robot designs!
Tony
Asked
— Edited
www.ez-robot.com/Community/Forum/Thread?threadId=8067
It looks like that if you need a rock steady reading of say an external digital line it needs a 500mS delay and continuous reading of a serial input seems to need to a huge 1000mS (or higher) delay between packet reads to start being a reliable link. Less time delays will work but read errors will increase.
The following are delays added (between packet reads) into the master PICs main loop communicating with the v4
100mS delay in main loop gives 140 available errors in 500 sends
500mS delay in main loop gives 40 available errors in 500 sends
1000mS delay in main loop gives 2 available errors in 500 sends
2000ms delay in main loop gives 0 (zero) available errors in 500 sends
After longer tests even 2 second delays gets a small number of available errors. Most of the available errors are "2" or "6" from a 3 byte packet sent although I have seen "9" quite a few times as values that the v4 computes are available?
I also keep getting random disconnects which only seem to happen in the UART read mode?.
Some further info - the PIC to v4 baud rate is 19200, so the 3 byte packet takes under 2 milliseconds to transmit.
Fortunately for me the head PIC network does most of the work (over 90%) and only sends processed data to the v4, so I can try to live with these huge time delays.
For me though, this is a major flaw in using the v4, but I am going to try some CRC algorithms so I do not lose data - this of course will slow comms down even more.
My expertise is not in wifi, so I cannot be sure if this is a problem, but the wifi channel was not busy when these test were carried out, and I do not have any wifi issues outside of ARC.
Tony
Going with a lower baud rate would certainly get much more data reliably across the channel, even at very low baud rates, than waiting upwards of seconds between data transmissions. If the PIC is unable to go lower, I would suggest a different interface device or a custom unit that can go to lower rates.
What I am now doing is sending a dummy byte from the v4 and making the PIC wait for this before sending any data - its a crude hand shake and is clunky and not very efficient, but I have been running the test code for a couple of hours with no errors at all. This is OK as I can make the PIC sync with the v4, but a device like the B5T Omron sensor will not have this facility so some of the continuous data sent from it will get lost if it is directly connected to the v4 UART.
Tony
Very interesting problem, one which I'm sure others will be interested in and might like to help with. When I examine the issue I break it down this:
1. Custom PIC board sends serial data out
2. Data is send to the physical EZb4 via the UART port
3. EZB4 sends data wirelessly to via Wi-Fi to network or PC
4. PC receives receive Wi-Fi or network traffic from EZb4
5. ARC software running on the PC receives the data
6. Scripts in ARC interprets and display data sent from Custom PIC.
In my mind, there are 5 points of failure for the communication drops. I have a few guesses as to what might cause it (we probably all do) but maybe some different troubleshooting and experimenting would narrow down the scope of the issue.
The biggest question might be is this a problems an issue with the serial protocol on the PIC side or the EZb side? My guess is you have a higher level of knowledge in this area than I do, but over the years working with serial device professional and for personal us I've witnessed a wide verity of issues related to latency, timing, baud and handshake protocol and while it seems like serial is serial, but that has not been my experience over the years.
I'm curious if you have the ability to do some other experimenting?
Experiment 1: If you were to take a popular known microcontroller like an arduino and connect it to the UART and run similar PING style tests to see if packets are also lost?
Experiment 2: Connect to the PIC serial to the PC and run a similar test and see of packets are also lost?
There is no trouble with the PIC comms side - I have been writing PIC code since 1995 so I am very proficient in PIC serial comms. In fact the ALTAIR head has 3 networked PICs all working in unison (using quite complex low/high priority interrupts) and it all works fine. The master_PIC connects to the v4 and this is where it all goes wrong - I have checked the master_PIC to v4 serial link on a custom PIC terminal that I use for testing and that is working as expected. I have never used an Arduino in my life, I just like coding PICs which I am now pretty efficient at!
The thing with the PING/Arduino style tests is that you will probably never notice the missing data/packets as most will get through - it is only that I am counting/analyzing every data transaction that I am seeing the drop outs and errors. Of course with something like the B5T sensor, missing packets can throw the whole thing out.
Using the dummy send byte method from the v4 (to form a crude hand shake) with the master_PIC is working - the test has been sending live data from the master PIC and head network for 4 hours now with no errors. This is not the best option, but at least the master_PIC now has a reliable comms connection with the v4.
Tony
I think its pretty poor to have to pretend that someone has resolved my help request just to stop (what in my opinion are) these unnecessary emails, this will be the last time I will post for assistance because of this.
EZ-Robot, can I suggest that you allow for the situation that a help request cannot be resolved, so it stops these continuously annoying emails being sent in these cases. Thanks
Tony
Sorry now back to Tony's thread.....
Yeah, it is annoying. I started getting them even no one responded to one of my posts at all. Just my original post in the thread, nothing else. Steve G was kind enough to "bump" the thread to try to get some response. But the fact remains it is still unresolved, and I will continue to get notices of that fact what seems like every other day. Obviously these notices are automated, but maybe there needs to be humans looking at the situations to keep them from going out unnecessarily. Especially considering that you can be banned for not marking the thread resolved within some amorphous time period. In some cases it just takes time to get an issue resolved. Perhaps a lot of time. Getting unending notices so soon and so often does not help the situation. Worse, it can lead to simply giving up on the product altogether.
Tony quick question for you, are you using internal RC oscillators with your PICs or external crystals?
I now get a "nag" email every day, with the usual mandatory threats.
I am quite offended by the threat that we can be banned if we do not accept that the thread has been resolved within a certain time period, when obviously on some occasions it does not! To EZ-Robot I am not going to pretend this thread has been resolved when it has not - so to all my forum friends if I suddenly disappear from the group you know why!
I will not conform to this treatment, (and being treated like some sort of errant child) so I may well get banned!
DJ and EZ-Robot, you respectfully need to seriously look at this urgently else you will lose valuable users and future customers.
My recommendation to forum members is to never use "assistance required", I am sure help will still be given from this great community.
Tony
More info on this, the master_PIC serial link to v4 has a "high priority" interrupt, so when the v4 sends a dummy (or command) byte the master_PIC stops whatever it is doing (jumps to the interrupt) and sends the latest data packet (containing sensory data etc from the networked PICs) back to the v4. The v4 has to instigate the transaction, not having interrupts on the v4 is one reason for these limitations.
Also the random disconnections are a real nuisance, the longest I have got is a run of 4 hours, but mostly there is a disconnection every 2 hours or so.
Tony
Use two Ezb's connected by serial ( no other devices connected ) and see can you replicate the fault,if that is possible it may identify where the issue lies.Then you will find a way with you skills to resolve it.
Pat
The logical challenge is a result of what you're not used to with micros. The EZ-B v4 isn't a micro, it's a program on a micro that does a bunch of stuff and communicates over WiFi. So, Tony's past experience as a micro assembler programmer is to check the UART Receive bit on the micro to instantly (within a microsecond loop) see if there is data to be read. However, on the EZ-B v4 you can only check to see if data is available at a much much slower interval (milliseconds vs microseconds). This dramatically increases the chance of checking for data while the data is still being transmitted from the serial master device.
You will always run into a situation where the number of expected bytes has not been transmitted, yet. And that's expected behavior for this design - delay, loop and the next time the data will be available.
My advice is...
1) ignore the error condition that you had created because it will occur when ever the condition executes while data is in the middle of being transmitted.
2) treat the condition with the same logic as my original example that i had provided to you when the question was first asked. Reference this thread: http://www.ez-robot.com/Community/Forum/Thread?threadId=8067
You will notice that in my original response to you, I had written a loop which checks for the availability to be EQUAL or GREATER than the byte packet size. From that, i pull the data in increments of the packet. Here is an example and merging your most recent code...
Code:
To provide additional information on how the EZ-B v4 serial works may help set your mind at ease. The STM32 ARM micro for the EZ-B v4 has DMA for the UART receive and transmit. I use only the receive DMA and it is enabled when the UartInit() method is called which configures the DMA and UART parameters. The DMA has a large array in memory which it stores the incoming bytes automatically. This is a fantastic hardware configuration and requires only configuration code and no "program" code. The data is stored automatically in a buffer and the DMA knows its position in the array by a CPU register.
When ARC calls UartAvailable(), the EZ-B v4 returns the data in the register which is the number of bytes currently populated.
When you read the Uart, the register is _not_ reset. As you would expect, the number of bytes you read are subtracted from the register. This means it maintains the bytes which you haven't read. The only way to fully clear the register and reset it to 0 is to read all available byte data.
The EZ-B v4 micro has a hardware configuration, not a software program that handles the UART. In my very extensive tests, the hardware of the STM32 micro has not had any issues.
About the "nag" email every day, I too have got those and found them annoying but my take on it was a little different, from my view it was likely implemented to encourage folks to check to see if they have a resolution, but it comes across as a little naggy. And I never viewed the threat to be banned as serious or even directed to me as I view myself as a good community member (well, at least borderline "OK", LOL), and viewed it as directed to those who would abuse the message boards.
But given yours and others feed back my hope is EZ-Robot will implement some verbiage changes to the notification and some logical rule changes and some more options to denote the state of a help request.
I'm glad you brought the topic up Tony, but my hope is that you don't leave the forum over it or take the notifications as personal. I think EZ-Robot staff have been good about making changes that user bring to their attention, I'm sure this will be improved as well.
I was going to mention the same thing, the automated responses aren't directed at the people who are regularly engaged in the forum but rather those that come and go and sometimes forget to close their thread.
Tony,
I did some tests last night with an Arduino (which has an external crystal) with your code on the ARC side and found much the same results as you did. I figured I'd try to turn over a few stones to see if anything like an internal RC oscillator might be related but, nope, it wasn't. I will have to default to @DJ's response on this one as he has much more knowledgeable than I in this department.
I guess I should ask the question that hasn't been asked yet, what is your goal? Maybe we can find another route to achieve what you are going for. Are you looking for the fastest way to give real-time data from the sensors directly to the GUI? Could you possibly for-go the master-PIC and use UART0, 1, 2 from the ez-b directly to the slave PICs?
DJ, I did use your code extract and it helped a lot but did not fully stop the errors - it is the 100mS delay as you suggest "to be friendly to the communications channel" before looking again that is the killer, this is a huge amount of lost time for the tech that we use today.
For the record and to hopefully help others, here is my working solution that does not need such a long delay as gives a reliable data transfer:-
1) Master_PIC does its own work/processing which can be instantly paused by a high level interrupt from the v4 (on the serial line).
2) The v4 instigates a data transfer either by sending a dummy byte or a command packet, this causes the master_PIC to jump to the HP (high priority) interrupt routine - the master_PIC now reads in the byte or (genuine) command packet.
3) I found as DJ mentioned in his explanation, that the PIC response is too fast to straight away send back a data packet to the v4, so the PIC needs to wait around 10mS (in its Interrupt handler routine) and then send the packet to the v4 - it seems that any delay < 7mS brings back the errors again as DJ explained.
Using the above method, I just finished a test run of over 40,000 data transactions between the v4 and the master_PIC and only logged one error which I can live with.
Again thanks for your input.
Tony
Also note that the word errors is incorrect. "Expecting" the packet to have been transferred before transmission has ended is not an error. Again, this is a logic challenge and once it's understood that the ez-b v4 and the other serial device have no way of synchronizing...
The only option to synchronize is to do this - and ensure you're using a very high baud rate of maybe 115k or higher if possible.
Code:
The baud rate I am using is 19200, so the 3 byte packet takes under 2 milliseconds to transmit.
You are correct "expecting" is a better term for the event, I now understand how the v4 operates thanks to your guidance.
Tony