Ok, guys, have you noticed, I kind of missed my deadline (at least in the GMT timezone).
Found a last minute issue in Windows 10 that was driving me up a wall to figure out why it was happening so I could fix it. Indeed it should drive me up a wall, because in the end it turned out it was nothing with my code, but with what Windows 10 was doing (these type of issues are REALLY HARD to diagnose, because you are looking and looking for a problem in your code and that is not where the problem lies).
The story of how I figured out what was happening is a bit complex, so please bear with me as I try to explain in plain terms:
Winstep applications are single threaded. This means that when a piece of code is running, it can do so knowing that the routine will run from start to end without some piece of code elsewhere in the same application running at the same time.
The disadvantage of this is that single threaded applications cannot do two or more things at the same time (for instance, rendering icons *and* responding to mouse events). To get around this we deliberately place 'interrupts' at specific points in the code which give other parts of the application a chance to run as well. It's a bit like this: do a little bit of task A, interrupt so task B can also happen, go back to doing another little bit of task A, and so on.
This way the application appears to be always responsive even if it is busy doing something, and, because we know *exactly* where the interrupts will happen (because we put them there), we can test at that point to see if task B did something that would have an influence on task A.
Now, applications can also be multi-threaded. Multi-threaded applications are a REAL nightmare to code properly, because tasks A, B and C are all running at the same time independent of each other. Task A can thus be interrupted by the OS at ANY time to do task B or task C, and if the application hasn't been coded properly via complicated methods such as semaphores and thread synchronization objects to take this into account crashes and weird things will start happening.
Anyway, going back to the story:
Sometime ago I had to work around the fact that Windows itself (not my code) was triggering an unexpected interrupt when rendering thumbnails. Figuring this out was already a nightmare, because that was NOT supposed to happen at that particular point of the code. Once I figured out what and, more importantly, where it was happening, I could work around it, just as if I had placed an interrupt instruction there myself.
So, basically what happens is this: when you open a new tab for the first time, all the icons in it have to be retrieved. Since doing it for the first time normally implies retrieving those icon images from disk, it can take a very long time (in terms of computing time, of course, not human time).
If the Shelf tried to retrieve all the icons before showing the new tab, this is what would happen: you would click on a new tab and absolutely nothing would apparently happen for one second or more while the Shelf was busy retrieving all those icon images - to you, it would look as if the click had not even registered.
So, to keep the Shelf responsive at all times in a single threaded environment, a lot of tricks are employed. As soon as you click on a new tab, the Shelf begins retrieving the icons for the visible row of icons in that tab. If it manages to get all of them in under half a second, great, if not, it stops fetching further icons and shows you the new tab with the icons it already got, and generic app icons for the rest.
This ensures that when you click on a new tab you will get a response in half a second AT THE MOST.
If the Shelf didn't have time to retrieve all the icons in that half second, it will then start fetching and displaying the remainder icons one by one: fetch a new icon, display it, go back to responding to user input, fetch another icon, display it, go back to responding to user input, and so on... I call this 'lazy rendering'.
Now, all these icons are then cached in memory - the next time you open that particular tab, instead of being retrieved from disk the icons will now be retrieved from memory, which is several orders of magnitude faster - this is why a tab with a lot of items in it opens so much faster the second time you click on it.
Even when it has already displayed the visible row of icons, the Shelf will, unbeknownst to you, still be busy in the background fetching the icons for the next row of icons, the one you will see if you scroll down (fetch one icon, cache it in memory, go back to responding to user input, fetch another icon, etc...). When you finally do scroll down, hopefully the Shelf will already have finished retrieving all the icons in that row, so you won't have to see generic placeholder icons again.
Document thumbnails (images, videos, etc...) are also rendered in a similar way. First the Shelf displays all the icons, then it starts rendering those thumbnails one by one, replacing the icons in the Shelf with their thumbnail images as it does so (in the case of thumbnails, using that cool morph effect).
The above was an explanation on how things happen behind the scenes, so you can understand better what I run into now:
I had already noticed, when implementing the UWP Apps tab in the Shelf, that it would sometimes display blank document icons for some Apps instead of the proper icon for that app. Since I had seen this happening in Windows itself (e.g.; also in shortcuts on the Desktop to UWP apps) I figured at the time that this was a bug/issue with Windows itself.
You see, UWP apps don't use real icons like Win32 applications do (i.e.; ICO files). Instead they use PNG images of several sizes stored individually in a repository folder.
Anyway, today, as I was getting some screenshots on a Windows 10 VM for the news of the v18.5 release, something a lot more serious happened: when opening the Apps tab in the Shelf, if I moused over it before it had finished rendering all the UWP app icons (and those take an unusually long time to retrieve) the application would crash with an Access Violation error.
Needless to say, I then spent a very long time looking for a problem in my code, but I couldn't find any or understand why that was happening. Simulating the icon rendering delay on the IDE on a Windows 7 machine did not cause any visible problems, much less a crash.
I finally noticed that instead of the generic app icon, the Shelf was displaying a blank document icon, but only in the icon I had just moused over. The crash did not happen immediately either, the routine that fetches and displays icons that could not have been retrieved in the first half a second was also displaying blank document icons from then on.
This was the crucial clue I needed to figure out what was happening.
You see, as I said above, UWP apps do not have real icons. So, UWP icons are basically a thumbnail of a PNG image, and I had already run into the problem of an unexpected interrupt happening when the application asks Windows to render a thumbnail.
So, this is what was happening: the Shelf would retrieve the first UWP app icon, this would take longer than half a second, so the Shelf displayed that first icon and then proceeded to display generic app icons for the rest. It would then get busy lazy rendering the icons that were still displaying the generic app icon.
In the mean time I would mouse over a UWP icon still displaying the generic app icon, which prompted the Shelf to retrieve the proper icon for that item NOW.
And this is where the problems started - you see, the routine that fetches icons, which is NEVER supposed to be interrupted in the middle of what it is doing, was being interrupted because Windows was rendering a thumbnail for the first lazy rendered UWP icon. Windows should not trigger an interrupt when rendering a thumbnail but for some unknown reason it does - it should NEVER do this, but does. So, this interrupt would allow the Shelf to respond to mouse movements, which would then trigger ANOTHER request for an UWP icon to be rendered while the first was still being processed.
Now, I imagine Windows uses GDI+ to render UWP icons/thumbnails internally, and GDI+ is NOT thread safe. Basically what was happening here was the equivalent of using GDI+ in a multi-threaded environment: because GDI+ is not thread safe, things are going to go very wrong, very soon. Something would get corrupted inside the GDI+ environment, which is what caused the blank document icon to be displayed from then on for every UWP icon, and eventually GDI+ would crash, bringing the application down with it.
Once I figured this out the solution was to treat this as a multi-threaded situation and use a semaphore to make sure the Shelf does not attempt to retrieve another icon while one is still being retrieved.
This should serve as another example on how crashes and strange issues are not necessarily the fault of the application that does the actual crashing.
_________________ Jorge Coelho
Winstep Xtreme - Xtreme Power!
http://www.winstep.net - Winstep Software Technologies
|