Interesting 10.2.300 Performance Issue

performance

(Dan Edwards) #1

Just some background information before I jump into a problem I have encountered. I first noticed this issue one day while working and having sporadic internet outages but really did not pay any attention to it. My next encounter came during a customer engagement and this customer happens to have an “air gap” network.

The testing, shown in the attached video, happens in a dedicated Test environment but was also reproduced in several other environments.

The Server has Epicor 10.2.200 & 10.2.300 installed and each uses a DEMO database which is hosted on the same server as the application and uses SQL 2016. The server OS is Wndows 2016. Basically, everything including the client is run from the server for this test. In other testing split app server & SQL was used plus the client was on a workstation but I got the same result as what will be shown here.

Testing Environment:
image

Test 1:

With Internet access enabled, I was able to open both 10.2.200 and 10.2.300, as shown in the video, and launch the Change Log Report screen (the issue happens with any screen – this was just for an example). Once the screen opens – which is fast – I was able to immediately drag the screen around.

Test 2:

This test comes right after Test 1 and the only difference is that I shut down the interface that provides the Internet connectivity. You can see in the video that I issue the shutdown and immediately the continuous ping to google.com stops.

I then launch the 10.2.200 Change Log Report screen and it opens fast and I am able to move it around immediately.

I then perform the same test using 10.2.300 and while the screen immediately opens I have to wait up to 40 seconds, at least this first time before I can get the screen to activate. After the initial launch I can finally close it and then it, or any other screen, will take 10 seconds before they give me control. This goes on forever and never gets better until the Internet access is back.

Notes of Interest:

1 – If I simply remove the default gateway from the server – which will also disable Internet – it does cause the same problem (tested on several different environments) as long as certain conditions apply to routing. If I change the default gateway to just a bogus IP – same issue as shutting down the router interface.

2 – I disabled all CRL checks to make sure I was not running into issues with that

3 – I am in the middle of still troubleshooting but one thing that really caught my eye was this from ProcMon – not entirely sure what this call is quiet yet but will know soon. You will notice the process is Epicor that is encountering this TCP reconnect.
image

4 – Any version 10.2 and earlier works fine.

5 - A bunch of attempted Azure traffic

Video
10.2.300 Feature.mp4 (3.3 MB)


(Chris Conn) #2

Related? After E10.1.400.22 to E10.2.300.2 upgrade, client hangs for 2 minutes after each program (window) is opened


(Dan Edwards) #3

Yes - sounds the same and this is 10.2.300.4 (fails on all 10.2.300.xx releases) and the interesting part is that 10.2.200 works (which kind of rules out the CRL check). The sad part is if you have Internet up/downs or longer slow periods you WILL notice it in application performance.


(Chris Conn) #4

Maybe we can log and find out what its sending/receiving on the net . https://www.maketecheasier.com/track-internet-activity-windows-firewall-log/

Update - oh i see you logged Azure traffic


(Dan Edwards) #5

Yep - Between a ProcMon dump and Wireshark I have plenty to comb through :slight_smile:


(Chris Conn) #6

So, if we know what were looking for, with a little luck maybe we can identify the offending methods and make it easier for Epicor to fix it


(Dan Edwards) #7

I am on that path right now and hope to come up with something tonight.


After E10.1.400.22 to E10.2.300.2 upgrade, client hangs for 2 minutes after each program (window) is opened
(Dan Edwards) #8

Well - I did not fully solve but am able to get around this. The slowness seems related to the use of Brightcove and it looks like it is integrated into each menu item (except a few) for something help related. To test I added a host entry on the server for 127.0.0.1 edge.api.brightcove.com and BAM the speed was back.

@aidacra @Bart_Elia - any additional insight? Can we disable this elsewhere?


(Chris Conn) #9

And I was just about to ask if you were opt’d out of telemetery.


(Dan Edwards) #10

Telemetry a long time ago :slight_smile: does not have any impact on whatever this is doing.I even shut down Windows telemetry.


(Chris Conn) #11

I just hate you are working on this on a Saturday night, unless you are having a blast that is. I looked up the brightcove and it was a little curious. Seems related to video but I just couldnt make any link to what it would be used for in Epicor. Especially in every menu item. I havent stayed in the loop much on the Kinetic stuff, has it been working its way into the release version? I only ask because I saw some mention of similiar auth problems with a connector - which kinda made me think about Kinetic


(Brinda Whitaker) #12

Is Brightcove the video provider for Epicor’s e-learning?


(Dan Edwards) #13

Yes - they are all the video behind the new active homepage and the Knowledge On Demand


(Nathan your friendly neighborhood Support Engineer) #14

Great troubleshooting! I have no idea if there is a simple way to disable these calls :confused: I’m finding out.


(Tim VonDerHaar) #15

@danbedwards - Thanks for the detail. Work around worked for me as well. Following this thread to see what the permanent fix will be.


(Bart Elia) #16

@aidacra is pushing on the issue in the new Knowledge On Demand functionality.

For those reading along at home, the Telemetry system is built on the same framework as the tracing subsystem. It pushes data to a background queue and let’s the client go about its day. In a background task the queue is flushed to disk for trace, to cloud for telemetry. This risks losing a few ms of data but the benefit of not being a blocking, synchronous call is worth it.

Nathan might reach out if you don’t already have a ticket.


(Charlie Lloyd) #17

This is related to feature added in 300 to provide context-sensitive video help to forms. There are about 5 forms that use it now, but yes the call to see if videos exist on a form is in a base form layer. Try customer maintenance to see it work functionally. On Customer Maintenance go to Help > Video Help.
Issue has been entered and I’ll report back here soon.
Basically the code executes a call to Brightcove REST API (our video hosting partner) that says “are there playlists for this form” The call does use a c# dynamic object which bears looking into as a performance issue.
We haven’t noticed the performance hit in-house and I wonder if the Brightcove URL is blacklisted on your network Dan?
Regardless, we’ll look at the issue, check about externalizing the URL (to make it visible to be whitelisted), and/or add flag to enable/disable feature.


(Charlie Lloyd) #18

and thanks Dan for providing the work around!


(Charlie Lloyd) #19

Oh sorry I didn’t read far enough back into thread. Seems like the fix might be to set a (new?) global flag on a background thread when Epicor is launched to indicate internet access status. Forms or whatever could subscribe to the flag before running code that seeks responses from internet.


(Chris Conn) #20

That seems like a beautiful solution. Thanks to all the Epicor folks here helping us out!