CodeWorth: 2025

The theory of potential flows is something that I got to learn in the course CIVE 6370 - Environmental Fluid Mechanics and Turbulence - that I had taken this semester, which is the second part of my first year in this PhD program. Fourier transforms and Finite Difference Methods were two other really interesting topics for me in the other course, CIVE 7397 - Applied Mathematics for Engineers, this very semester as well but I didn't make anything with a GUI attached to the project work for it (it's just the solution to the Advection-Diffusion Equation using two Finite Difference Schemes implemented in Python - pretty standard stuff). For the former though, I thought it deserved a write-up of its own.

This is a JavaScript web application that allows a user to add in, modify and remove any number of flow elements and simulate the resulting flow field. Currently the available flow elements includes uniform flow, source/sink and free vortex flows. The supported output quantities are the velocity potential (Φ), stream function (Ψ), velocity (V) and coefficient of pressure (Cp).

The basic method this thing works is this: calculate the selected quantity for each element inserted in the domain, add up their individual contributions, display the resulting quantity.

From theory, the two fundamental quantities - velocity potential and stream function for the three supported flow elements are:

If the user selected stream function or velocity potential, it is directly calculated for each grid point and a contour plot is generated. That is the end of it. If the user selected velocity, the velocity potential is calculated for each grid point still and then a sort of numerical differentiation is performed at each location to calculate the horizontal velocity field (u) and vertical velocity field (v) separately (except at the last row and column). These two fields are then used to construct a scatter plot in Plotly.js, the plotting library used here, to depict the vector field of velocities. This turned out to be the most time-consuming part in terms of compute/rendering requirements. Since the library doesn't natively support drawing vector arrows, I found out that scatter plots could do the job, with two ends of a vector drawn as a line segment joining two points - their exact locations being determined by the relative magnitudes of u and v at that point, followed by a null data point (so that all the points wouldn't be connected together). Similar story for the arrow tips and their rotation angles. Sampling a small enough number of grid points to draw these vector arrows on was another non-trivial issue. The default domain parameters being -50 to +50 along both axes with delX and delY of 0.1 each would mean a million grid points. Plotly chokes at such numbers and the resulting vector field would be too dense to make anything out anyway. So, picking a reasonable fraction of those points fast enough took some time as well. At any rate, as things stand, retrieving 2000 points and plotting their vector representations still takes over 10s, during which the whole page freezes. Maybe Web Worker threads could be used to better handle this but the real solution would be to find a better plotting library. Funny thing is, before realizing that this rendering process was the bottleneck, I blindly assumed it must be the numerous loops used in this process that were slowing down the application instead and quickly jumped to port them to GPU kernels using GPU.js (an awesome library that I had already tried in the past). Only after I had finished that I understood it didn't make any noticeable difference where the loops ran for the default domain parameters.

Another issue plaguing the application right from the beginning was that of phase jumps. In short, in contour plots involving phase values - basically angles points make with the origin from the positive x-axis - there is a discontinuity between phases of points from just below the x-axis to just above it (if the angle is a direct result of an atan2() function), either on both sides (positive and negative) or on the positive side only (if the angle is corrected to always report positive, usually clockwise from positive x-axis). This manifests in contour plots as a horizontal strip of dark lines drawn very close to one another due to the large difference in the phase values over a very short distance.

Initially, I thought I must have made some error in the code that would only show in non-symmetric flows (e.g. a source-sink pair not perfectly along the x-axis). I later realized that this problem is actually a pretty common occurrence in contour plots and a field of study of its own. I didn't bother to go too deep into it once I learned about its ubiquity (and in equal part its stubbornness to go away) as long as the contour lines could be made out (the curious heatmap values notwithstanding). I have included an attempt to correct this issue in the code but later decided to have it jumped over with a flag after it became clear it wasn't going to help much.

That's about the breadth of what I wanted to write here. Other than the above, the general looseness of JavaScript and the spaghettification of code that usually follows when a project starts out as a quick and dirty snippet to get a thing done doesn't need much expounding upon. GPU.js was a delight to work with and to see it in action still was a great feeling.

Here's the GUI of the application that I prepared for my presentation:

Here's the GitHub repo

Here's the live webpage

These days I've been working towards my PhD. As part of my research work, I have had to understand Reinforcement Learning (RL) at a fundamental level. Searching for materials on the topic led me to Dynamic Programming (DP) - a subject that I had previously studied as part of my Master's degree curriculum. Turns out DP is pretty closely related to RL. The problems they are designed to solve are pretty similar, if not the same - maximizing some kind of sum of rewards in the context of an environment embodied by a Markov Decision Process (MDP). MDPs themselves are a big rabbit-hole themselves, so I'm going to refrain from going into the details. The important bit is - both of these fields: DP and RL - are trying to do the same thing, but one (RL) is more general than the other (DP). Historically, DP came first, followed by RL. To solve a DP, you need how its environment - the governing MDP - works completely. There is no such hard requirement on RL. However, for that convenience, RL compromises on DP's promise of a perfect solution. For a lot of practical problems, that tradeoff is very worthwhile.

Anyway, to brush up on my knowledge, I revisited a textbook DP problem I was taught during my Master's degree: the water allocation problem. The problem is as follows:

A total of 6 units of water is to be allocated optimally to 3 users. The allocation is made in discrete steps of one unit ranging from 0 to 6. With the 3 users denoted as User1, User2 and User3 respectively, the returns obtained from the users for a given allocation are as follows. The problem is to find allocations to the 3 users such that the total return is maximized.

At first glance, given the discrete and finite nature of the problem, an exhaustive search and evaluation of the total rewards for the limited number of combinations of water allocations is wholly possible, but the idea is not to go with this bruteforce approach. It is easy to see how quickly the search space can explode with trivial relaxation or scaling of the scope of the problem (say there are a million users, or the total unit of water available is 6 million instead of just 6). Dynamic Programming, with its bedrock - the Bellman equation, provides a systematic way to efficiently enumerate and evaluate the optimality of feasible solutions to this problem. Describing the solution methodology is out of scope of this article and there are numerous high-quality resources on the topic that cater to readers of varied mathematical backgrounds. The solution that I do want to cover here is one using a transformation of this problem to the world of Linear Programming (LP). This was motivated by an interesting PyCon UK YouTube video where Geraint Palmer optimized his Pokemon team to have the best chance of winning. To achieve this, he also makes use of concepts in multi-objective optimization and Pareto fronts within the framework of linear programming, and the whole video is absolutely worth the 26 minute watch for his explanations. The star of the show, however, is a Python library called PuLP. I won't belabor on the library and its capabilities but suffice to say it's a really great tool to formulate, translate (to a form that's readable and solvable by other more commonly used solvers) and solve LP problems. Coming back to our problem, the way to formulate the water allocation problem as a LP problem is to maximize a function that calculates the total return from the allocations made to each of the three users. After a while of thinking, I came up with this scheme: a function for each user that takes in the water allocated to them and outputs their reward (R1(x) or f(x), R2(y) or g(y) and R3(z) or h(z) as shown in the figure above, x, y and z being the water allocated to each of the users respectively); maximize the total of these three functions; constrain x, y and z to be integers less than or equal to 6 and more than 0; constrain their sum to be equal to 6. Now to achieve this in PuLP, I had to do a few other things. To actually define each user's reward function perfectly with the rewards defined at only the provided discrete allocations, I introduced 6 binary variables per user, each corresponding to whether or not a value from 0 to 6 is true - meaning only one of these should be equal to 1, and remember - this is for each user. To ensure only one of these 6 binary variables is ever equal to 1, I introduced a constraint dictating their sum be equal to 1 - for each user. The complete script is here. It's far from groundbreaking but the fact that it gave me the result [x,y,z]=[2,1,3] with a maximum total return of 28 that I had obtained using DP back in the day was really validating.

Okay, now on to RL. Grokking Deep Reinforcement Learning (2020) by Miguel Morales was a really good starting point picking up where DP left. The gridworld example it repeatedly uses (much like the OG Sutton's RL book of 2018) to communicate core concepts is really effective. There is such a glut of content online, video-based ones swamping out texts and articles, on the subject of RL that it is hard not to get confused how each of the RL algorithms works. Morales' book cuts right through that haze. Once you religiously go through the basic definitions of terms such as an MDP, state, action, policy, state and action-value functions, followed by the first exact DP solution methods: Policy iteration (and the distinctly small Value iteration discussion), most other algorithms after that should be a lot more digestible regardless of which resource you may want to follow thence. Personally, I felt like the chapter after the Policy and Value iteration, involving the Multi Armed Bandit was a bit of a digression though I'm sure the author thought it best to keep it there as a prelude to the "actual" RL methods to come (Monte Carlo evaluation methods). I felt like there was a definite switch from an integrated discussion of the "Prediction problem" (policy evaluation) and the "Control problem" (policy improvement) to a style where concepts were siloed off into algorithms for policy evaluation and those for policy improvement, which affected my motivation and attention, and at times felt a little crammed. This was when I restarted my search online to pick up after the exact methods (AKA DP methods - Policy and Value iterations). Oh, before that though, the Google DeepMind RL lecture series by David Silver were also really enlightening for the policy and value iteration parts. For the first "real" RL method - one that didn't depend on having access to the MDP's transition probabilities to reach a solution - Monte Carlo (First and Every Visit variants) sampling, hands down the best down-to-earth explanation was provided by Prof. Saeed Saeedvand of NTU, Taiwan here. I would also like to mention Kimia lab's videos by Prof. Tizhoosh that provided a very high-level intuition of RL methods to get started. Stanford University also had a series on RL on YouTube but it didn't catch my fancy for whatever reason. Only after watching Prof. Saeedvand's lecture on MC methods did I feel like I understood it enough to follow Sutton's chapter on the topic, after which I started to realize the dichotomy between policy evaluation and improvement algorithms (as I understand it up to now, there's not a lot going on in the evaluation space) and why the Grokking textbook was so structured (I think, anyway). I have started reading up on what's apparently called the SARSA algorithm and Q-learning but haven't implemented anything in those yet. However, I do have a simple gridworld application in JavaScript/TypeScript implemented with Policy/Value Iteration and Monte Carlo methods that was really fun and insightful:

CodeWorth

Tuesday, December 2, 2025

Potential flow simulator

Friday, June 6, 2025

A bit about Dynamic Programming and Reinforcement Learning

About Me

Blog Archive