CodeWorth: ScreenTranslator

At work, I have to work with QQ, an Instant Messaging program and it's got an International version and a Chinese version. For the Chinese version of the app, I mostly have to rely on Google Lens to translate the UI to English.

So, about 3-4 days ago, I thought why not make a desktop program to do the translation instead of having to open Google Lens on my phone every time I want to make sense of a weird menu item in QQ's UI. So, this is my attempt at that. All it does is it iterates through the automation/accessibility elements in the app, gets English translations for each of the elements' texts and creates an overlay window with the translated text right over them. I first wanted to use Google's Translate API but that seems to require a credit card even for the free tier, so I settled for a free API that uses LibreTranslate. The translations are not as good as Google Translate but one can make out the meaning with some deliberation.

UI of the program

UI of the program during the translation process

QQ's menu before translation

QQ's menu after translation

The operation of the program is simple, middle click on any window to translate the texts in it. Move your cursor to the top left corner of your screen to remove all translation overlays.

The program doesn't work for entire webpages because webpages tend to have a lot of elements (in my case with multiple tabs open with one being a Chinese site, I counted over 10k elements) to process. The C# library I've used here for automation elements extraction simply doesn't care for a large number of elements and simply returns empty. It should work in scenarios dealing with a limited number of elements such as software programs with their UI in a foreign language.

Here's the GitHub repo and here's the release

Here's the MDD:

Musings During Development of ScreenTranslator:
------------------------------------------------

Don't ever create Forms on multiple threads. Use the main UI thread for all your window needs. You'll save a lot of headaches this way. I'd tried creating the translation overlays in separate threads for each new overlay but faced problems such as residual windows when the thread they were created in were aborted (meaning I would have to interact with the overlays to get rid of them), design problem regarding how to best close the overlay windows (abort the threads they were created in or make the overlay forms themselves listen to a flag?; do I create a new window using .Show() followed by Application.Run() or .ShowDialog()? and why? - at any rate, when the thread was aborted externally, the ThreadAbortedException would only be triggered inside each thread when the cursor was placed over them, not immediately), problem regarding some translations being randomly missed for whatever reason and so on.
Just do whatever processing actually made you take the multi-threading route in different thread(s) and spare the UI/form creation logic to the main UI thread. this.Invoke() and this.BeginInvoke() are your friends. All of the problems that I described above went POOF when I did that. Countless StackOverflow posts don't recommend doing exactly this for no reason. If you're creating forms in separate threads, re-think your design. You will probably make it far simpler and solve most of your problems by leaving UI operations (that includes new form creation) to the main UI thread.

.NET's form's TopMost property is crap, like a lot of other things (Automation API comes to mind cuz it's also sth I've used in this project - or I should say, I've avoided in favor of the native automation API wrapper - CUIAutomation). Just use the native Win32 API: SetWindowPos() with the right params for it.

September 4, 2023 | 08:24 PM
-----------------------------
During automation tree iteration, you won't see what's not visible. I was trying to get all descendants of a web browser (firefox) to see if my tool worked to translate entire webpages but it didn't work. In the investigation, I found out that the CUIAutomation library was flat returning null for the FindAll() method when given the IUIAutomationElement returned from a call to ElementFromHwnd() (with the Hwnd obtained by the WindowFromPoint() API applied to the value returned by GetCursorPos()). To investigate what was happening, I opened Spy++ to see if I was getting the right hwnd and I found that the hwnd I had obtained from WindowFromPoint() was a child of a parent hwnd for firefox. I then got the parent hwnd for it and tried FindAll() with descendants as the treescope but I was still getting empty or null results. Then I thought to myself that the CUIAutomation library must be at fault here, after all, FindAll() with descendants is a very taxing operation, as suggested by MSDN. So, I made a fully functional program for the same in C++:
`
#include <iostream>
#include <Windows.h>
#include <UIAutomationClient.h>
#include <vector>

IUIAutomation* g_pAutomation;

void doItFaster();
std::vector<IUIAutomationElement*> GetDescendants(IUIAutomationElement* element);
std::vector<IUIAutomationElement*> GetChildren(IUIAutomationElement* parentElement);

int main()
{
std::cout << "Hello World!\n";
doItFaster();
}

void doItFaster() {
HRESULT _ = CoInitialize(NULL);
if (_ == S_OK || _ == S_FALSE) {
HRESULT hr = CoCreateInstance(__uuidof(CUIAutomation), NULL, CLSCTX_INPROC_SERVER, __uuidof(IUIAutomation), (void**)&g_pAutomation);

HWND hWndFirefox = (HWND)0x0006009E;
IUIAutomationElement* pBrowserElement;
if (g_pAutomation->ElementFromHandle((UIA_HWND)hWndFirefox, &pBrowserElement) == S_OK) {
IUIAutomationCondition* iUIAutomationCondition;
if (g_pAutomation->CreateTrueCondition(&iUIAutomationCondition) == S_OK) {

IUIAutomationTreeWalker* pAutomationTreeWalker;
if (g_pAutomation->get_ContentViewWalker(&pAutomationTreeWalker) == S_OK) {

auto leafElements = GetDescendants(pBrowserElement); // watch this

std::vector<std::wstring> leafElementNames;
for (auto leafElement : leafElements) {
BSTR name;
if (leafElement->get_CurrentName(&name) == S_OK && name != NULL) {
leafElementNames.push_back(std::wstring(name, SysStringLen(name))); // and this
}
}

int x = 0; // bp here

}

}

}

}
}

std::vector<IUIAutomationElement*> GetDescendants(IUIAutomationElement* element)
{
std::vector<IUIAutomationElement*> leafElements;

auto children = GetChildren(element);

if (children.size() == 0) { // this is a leaf element
leafElements.push_back(element);
}
else {
for (auto child : children)
{
auto descendants = GetDescendants(child);
leafElements.insert(leafElements.begin(), descendants.begin(), descendants.end());
}
}

return leafElements;

}

std::vector<IUIAutomationElement*> GetChildren(IUIAutomationElement* parentElement) {
std::vector<IUIAutomationElement*> retval;

IUIAutomationCondition* trueCondition;
g_pAutomation->CreateTrueCondition(&trueCondition);
IUIAutomationElementArray* children;
if (parentElement->FindAll(TreeScope_Children, trueCondition, &children) == S_OK) {
int numberOfChildren;
if (children->get_Length(&numberOfChildren) == S_OK) {
for (int i = 0; i < numberOfChildren; i++) {
IUIAutomationElement* child;
if (children->GetElement(i, &child) == S_OK) {
retval.push_back(child);
}
}
}
}

return retval;
}

`
It took me doing this that brought me to the realization that what was actually happening was that the web browser needed to be almost completely visible i.e. not blocked by any other window for the element extraction to work. My Visual Studio 2019 IDE was most definitely blocking the browser window when it was running.
ref: https://stackoverflow.com/questions/69122441/uiautomation-missing-to-catch-some-elements
There's no need to write my own custom FindAll descendants method like I did with recursion in the C++ version above. The built-in FindAll() with descendants treescope will work just fine. I only did what I did because I thought the stack was overflowing or something.
Also, the hwnd to be used for the parent element to do FindAll() is not what's available by WindowFromPoint(). You need to get the absolute parent window of the output of that API call to be sure. It might work for apps such as QQ (in my case, it worked perfectly fine with QQ but it clearly didn't, when I tried translating a Chinese website, which is what led to all of this) but doesn't work for all windows, especially web browsers.

September 5, 2023 | 03:39 PM
----------------------------
Turns out, FindAll() method provided by the IUIAutomation C# NuGet Package doesn't give results (i.e gives 0 length IUIAutomationElementArray) if there's a large number of elements. I tried parsing a firefox webpage browsing to csdn.com - a Chinese language site and the C++ code above (my own version of FindAll() for getting all descendants) took around 30s but returned over 10k elements. Did FindAll() with the treesccope of descendants in C# for the same hwnd/element and it immediately returned with a length of 0 (not null, but an IUIAutomationElementArray with length 0). Maybe porting the C++ version of GetDescendants() to C# by manually recursing through all immediate children would work for C# as well but I'm not going to do it. Just don't use it on websites.
For Brave browser with my outlook open on a tab, both the default FindAll() descendants in C# and the C++ version work just fine and there's around 370 elements.

September 6, 2023 | 11:50 AM
------------------------------
https://stackoverflow.com/questions/13225841/starting-application-on-start-up-using-the-wrong-path-to-load
When running on user logon (via HKCU Run entry), the working directory is not the application exe location. So, file paths that haven't been fully qualified don't work.

CodeWorth

Wednesday, September 6, 2023

ScreenTranslator

No comments:

Post a Comment

About Me

Blog Archive