Asked — Edited

*BEEP BEEP* Server Outage Warning

Hey everyone - this is a quick note that the Synthiam servers will be offline on Tuesday, April 18th, between 11:00 PM Mountain Time for approximately 3-4 hours (hopefully less). This should not affect the operation of ARC because the local subscription cache will take priority. However, the website and community forum will be offline.

Pacific Time: 10:00 PM Mountain Time 11:00 PM Central Time: 12:00 AM (April 19th) Eastern Time: 1:00 AM (April 19th) UTC 5:00 AM (April 19th) Additional time zones can be calculated: https://www.worldtimebuddy.com/mst-to-utc-converter

Affected Services

  • Cognitive services (bing speech recognition, vision, emotion, face)
  • Project cloud storage
  • ARC diagnostics and logging
  • 3d printed downloadable files
  • Website (swag, purchases, documentation, community forum, etc)
  • Account creation (from ARC)
  • Online servo profiles

User-inserted image

We're migrating the infrastructure to a faster server cluster and expanding storage. For those interested in the sizes of our infrastructure, the Synthiam platform consists of 5.8 million files of 350 GB. This is mostly cloud projects and historical revisions of each saved file. I was just as surprised as you when I was informed of the file count for the platform! Each current server (website, cloud authenticator, logger, exosphere, file manager) is 4-core with 16 GB of ram. The new servers are 16-core with 40 GB of ram. So we should see a significant performance improvement as the current servers are running full-tilt with the increased usage we experience.

There will be an information banner on the website throughout the day on Tuesday as a reminder. Ideally, offline servers should not affect anything other than visiting the website. Subscription caches should take care of authentication while they're down.


ARC Pro

Upgrade to ARC Pro

Get access to the latest features and updates before they're released. You'll have everything that's needed to unleash your robot's potential!

PRO
Canada
#1  

What time zone is this?

PRO
Synthiam
#2  

Updated the first post

PRO
Canada
#3  

I hope you upgrade goes smooth. I think the model cloud providers charge for servers (bare metal/IaaS/PaaS etc ) needs to change.  The cost increase to go from 4 core 16GB to servers with 16 core and 40GB RAM is really hard pill to swallow on a monthly bases especially when the PC on your desk that costs $2000 probably has similar specs. Yeah sure cloud hosting includes power, network, backup etc but I have to envision a lot of companies are going to consider going back to on premise models and just consume SaaS services as needed, especially now all the commercial real estate is sitting empty because no one wants to go back to the office.

PRO
Synthiam
#4   — Edited

That does open an interesting discussion. I think security (physical security) is one of the biggest qualities that companies consider when choosing cloud.

Physical security is more than theft, because it includes fire, flood, electrical spikes, power outages, etc

the next benefit would be virtual physical locations that cloud provides. Because these cloud infrastructures have a fast and huge network that stretches across the world. This allows you to choose where your server is stored. Essentially, it’s not stored in a single location anyway because it’s just a process that floats across multiple available resources. But, the virtual physical location is determined by the ip (to clients). So you can appear anywhere - and that’s good for SEO and performance.

now, those two aside, the server costs can be quite high. But that also saves a company from hiring a network/server admin - so there’s cost savings there. If we wanted a local server cluster, we’d need to hire someone to maintain it. From the outside, you’d probably think there isn’t a lot going on behind the scenes for Synthiam’s platform - and I like that we’ve hidden the complexities. I like that we look simple :). There’s quite a few servers doing many things - specially since we work with so many universities and colleges for exosphere and telepresence hosting. Also the cloud project and archival history is a big feature that you’d be surprised how often is used. Even consider a company where all a person or student does throughout the day is program servos to do things. They're constantly saving project revisions 3-5 times per hour. In an 8 hour day 5 days a week multiply that by many users - it adds up

I think the huge downfall that Ive noticed with cloud is the scalability is deprecated on older offerings - making money from customers by deprecating products. For example, our cluster was put up in 2018. It’s only been 5 years, yet azure has deprecated the server types we’re using. This means there’s no seamless upgrade path. We can’t just push a button to grow the hard drives capacity or add more CPU’s. Instead it has to be migrated to entirely new vms

if we had our own server, drives could be added or swapped in the raid configuration to expand storage. That would have prevented downtime.

in summary, I’m a fan for local storage - I think cloud costs are out of hand. There’s no savings in the cloud like there once was. If you do local storage, an offsite backup should be top priority. Even with our cloud we have offsite backup at 2 am daily.

PRO
Canada
#5  

I agree there are pro's and cons of cloud, no capital outlay, if you architect your solution to use stretch cluster that take advantage of multi zone regions so you never have an outage, if you scale up and scale down based on demand. There are also huge advantages if you utilize containerized microservices and Kubernetes or serverlesss computing so you only pay for the compute used on demand.

The other gotcha I see companies miss when upgrading their servers is the software license costs. Example you take a 2 core oracle DB and throw it on a 16 Core server with VMware or Hyper-V, you pin it to 2 cores and you get hit with a software audit telling you to pay for 16 cores despite the fact you only had 2 vCPUs assigned.  You spin up dev / test / staging environment and the software companies say you have to license these as well. Some areas you have a win example windows server datacentre addition that is on a 4 core box is already licensed up to 16 core so you can actually reduce costs if you consolidate to a single server with multiple VM's.

PRO
Synthiam
#6  

Okay 5.8 million files... ugh - we've been up for the last week non-stop on this. I can't believe they deprecated the server package we had, which prevents us from simply adding more storage space and adding more cpus. It's so much work because of that - so we figured out how to copy portions of the data based on how often the data is accessed. We're prioritizing, which means some stuff might not be available for a day or two as the files migrate. The only files that would affect anyone on this forum would be cloud history revisions. But outside of that, the rest is exosphere machine learning data and that won't affect anyone here on the forum. Those customers have been informed.

  • We've been copying the 5.8 million files for the last 3 days and the new server is updated as of last night. Tonight, we'll do an incremental update, and a final tomorrow when we go dark.
  • What you'll end up experiencing is all servers going dark at 11 mountain time.
  • The DNS will change immediately, but it'll be a "undergoing maintenance" page. ttl is 30 minutes, so we'll be "offline" to some ppl until it propagates.
  • The most recent database will be copied to the new server. We have to export, compress and copy. The database is 89gb, so that'll be cpu and i/o dependent. In our migration tests, the db compresses to 45gb. Compressing the archive seems to take 1 hour. The time to copy the db archive to the new server seems to be about 2 hours. Uncompressing the archive takes 1 hour.
  • While the db is being copied, we'll start an incremental backup on the high priority 5.8 million file data folder.
  • Finally, once the high priority files are copied, we'll run a background task that will copy the rest of the lower priority files of the 5.8 million files, which will take a day or two.
PRO
Canada
#7  

Typically when I had to architect cloud migrations I would solution the use of data transfer tool like Aspera that provide real time encryption and then uses a UDP hybrid protocol called FASP to migrate the files.  I believe Aspera is available in Azure and I am sure there are other alternatives that provide similar high speed encrypted UDP data transfer capabilities. (and no there isn't any data loss).

PRO
Synthiam
#8  

We don't need encryption because it's all within our virtual network in the cluster. UDP also requires CRC checksum during transfer which is additional overhead - with the file count we have to move, smb will be fine. We're not using Windows copy - I created a multi-threaded copy utility that uses the Windows file system API. It works fast enough that the i/o on SSD is full tilt. Funny, the data folder is still called "ez-robot uploads" because it's been that way since 2011 :D.

We will migrate to our locally hosted servers next - because this is ridiculous. Ironically, one of our ex-employees was chatting with me last night. He reminded me that in 2018 when we spun up the current production servers, Azure did the same thing - deprecated the "server package," and we had to copy and rebuild everything. So it appears Microsoft deprecates server packages to make people have to pay for migration.

User-inserted image

PRO
Synthiam
#9  

Boom! It looks like everything is running! The lower-priority files are still being copied and will take a few days. I doubt anyone will notice them because they're robot project file revisions that are years old. Everything went smoothly without a hitch!

#10   — Edited

Whoo, that was stressful!:p Hey just kidding. I know it was a lot of work. I'd say good job but I'm not qualified. My job here is to enjoy all your hard work. Thank you!

As long as the subject of storage and file size was brought up, how big of project can be uploaded to your EZ Cloud? I had to stop saving my B9 Robot project to your cloud a couple years ago because it's too large and it wouldn't save. It's currently somewhere over 55MB and growing.

EDIT: OK, I found the file size allowed in a the ARC Download section that says 128MB for Pro and Team subscribers. My project is half of that. I guess I need to try to save to the cloud again. Must have been something that went wrong back then. Maybe a timeout kicked in.

PRO
Synthiam
#11   — Edited

To remember the most smoothest migration i've ever experienced, i think it's good to archive our update notice during the process! Everyone did a wonderful job!

User-inserted image

PS, the best was the "work it" link pointed to https://youtu.be/gAjR4_CbPpQ?t=17  That turned out to be the theme song during the migration :D

Unknown Country
#12   — Edited

Congrats on the smooth migration!!! I wanted to mention "Thank you very much" for the prior heads-up warning too, it was greatly appreciated!