Graph-model metaschema for progress codebase - any interest in AI driven code analysis/modernization?

TomScott · May 20, 2025

Good morning all,

We’ve developed a robust method for converting our entire Progress codebase into a “metaschema” (database layer). This metaschema mirrors a graph structure, using keys rather than direct relationships, which allows us to analyze the codebase in novel ways. Recently, we converted our entire schema to Neo4j - this process took just a couple of days with our internal tools, and it’s proven to be a powerful approach for understanding and modernizing legacy Progress systems.

Now with AI integration, we’re seeing unique advantages: rather than embedding whole methods into an AI context (as with Cursor or Windsurf), our system can trace specific code paths, dramatically reducing token usage and improving relevance and accuracy. This is especially valuable given the complexity of older Progress codebases (we fully support v9 and massive logic blocks/files), which many newer AI IDEs don’t handle well. We're thinking this could be helpful for moving from older v9 to newer web options from Progress at one time rather than involving any hand coding.

I’m posting for a few reasons:

Interest check: Are others here interested in this concept, or in sharing experiences with similar approaches?
Connection request: Does anyone have contacts at Progress who’d be open to a conversation about partnership or collaboration? I haven’t had much luck via email or LinkedIn.
Early users: Would anyone running an older Progress environment have an interest in trying this tool out in the near future?

Happy to share more details. Thanks for your thoughts!

Tom

PatrickTingen · Jun 5, 2025

I would love to see more of this, because I am truly curious how this works and what it can do, however I don't think I could really put it to use since the client I am at now decided to move away from Progress by rebuilding the application in .Net. Yes, this tool might be of use then, but the project is well underway now (although - surprise, surprise - rebuilding a 20+ year old system from scratch appeared to be a little more difficult than expected beforehand)

jdpjamesp · Jun 6, 2025

PatrickTingen said:
I would love to see more of this, because I am truly curious how this works and what it can do, however I don't think I could really put it to use since the client I am at now decided to move away from Progress by rebuilding the application in .Net. Yes, this tool might be of use then, but the project is well underway now (although - surprise, surprise - rebuilding a 20+ year old system from scratch appeared to be a little more difficult than expected beforehand)

It would make a great Progress User Group case study! I mean the fact it's hard to move away

TomBascom · Jun 6, 2025

Forgive me if I am just an old fossil but I am having difficulty understanding the attraction of migrating a schema to neo4j. This approach seems like it might be helpful for OLAP applications or for adding on such functionality but I don't see that there is anything there that would be very helpful for business transaction processing.

But perhaps I am missing the point?

Some specific and detailed examples of "analyze the codebase in novel ways", "unique advantages", and "moving from older v9 to newer web options from Progress" would probably go a long ways towards shedding light on what it is that this tooling actually does and whether or not it is interesting.

FWIW I do have some very old legacy code hanging around that I would be happy to try to modernize. But my pain threshold for trying new fancy new tooling is very low.

TomScott · Jun 6, 2025

jdpjamesp said:
It would make a great Progress User Group case study! I mean the fact it's hard to move away

It would definitely be interesting. I'm also surprised to have not heard back from anyone at Progress - it seems like with the rise of AI IDEs, they'd be intrigued by a similar concept.

TomBascom · Jun 6, 2025

TomScott said:
It would definitely be interesting. I'm also surprised to have not heard back from anyone at Progress - it seems like with the rise of AI IDEs, they'd be intrigued by a similar concept.

The easiest way to hunt down the right people and get a conversation going is to attend a PUG.

TomScott · Jun 6, 2025

TomBascom said:
Forgive me if I am just an old fossil but I am having difficulty understanding the attraction of migrating a schema to neo4j. This approach seems like it might be helpful for OLAP applications or for adding on such functionality but I don't see that there is anything there that would be very helpful for business transaction processing.

But perhaps I am missing the point?

Some specific and detailed examples of "analyze the codebase in novel ways", "unique advantages", and "moving from older v9 to newer web options from Progress" would probably go a long ways towards shedding light on what it is that this tooling actually does and whether or not it is interesting.

FWIW I do have some very old legacy code hanging around that I would be happy to try to modernize. But my pain threshold for trying new fancy new tooling is very low.

To clarify, our tool isn't migrating the schema, but the actual business logic. This may be a long winded response...

First, migrating may be the wrong word as there isn't a new system that uses the graph db for transaction processing. To understand, and I apologize if this is already known, but we need to talk about AI and how it works a little. Basically, AI is this tool that knows everything right, but to talk with it, you have to pass it a context window. At first, these context windows were very small, but recently they've grown in size with OpenAI's o3 reasoning model having a 200k token window. To give you an idea of this in the real world - yesterday I was exploring differences in output and a single program of around 7000 lines accounted for about 175k tokens.

Then there are two primary ways most people are using AI these days:

“Needle-in-a-haystack” semantic search - ideal for huge windows.
Quality search where the model fully grasps the input and can reliably return an output without mising ANY detail. I've found this to only be reliable on smaller chunks of say - up to 400 lines, but this is slowly increasing.

Now onto graph dbs. You made have heard of Cursor, Lovable, or Windsurf (just bought by openai). These tools work very well with newer modern stateless environments and smaller self-contained methods. They can index the codebases and will essentially summarize and embed each method of the application. They then store these full embeddings as a graph db - I'm assuming. The benefit of this approach is that when a user comes along, who's not that experienced, they can say "I want this this to be added". The AI embeds this query, it searches the graph for the most similar methods, and then the graph allows for related entities to be pulled back that symbollically match, but might not have actually been found using a purely semantic search (RAG vs graph). Now, they pass all that context into the window, the AI model takes it, and starts making suggestions or changes. But, this is where the problems arise. First, AI has a tendency to fully rewrite code blocks so if a section of code is larger as much code is, it will miss things on the rewrite. It will then try to refactor the code, and you inevitably start developing issues since these new tools are very bad at updating the rest of the code base to use this newly refactored method.

There's also the problem of context, and yes, newer models support larger windows (up to 10 million tokens for Llama), but these are mostly only useful for search (NIAHS). Beyond that, larger token windows really only benefit the vendor ( Cursor, openai, etc charge based on token use) since it's a little like reading 5-10 whole books for EACH question of a test.

Now, our tool. We take our full codebase and break down the logic into an independent database layer where we have the full business logic broken down to it's smallest detail. There is no programming language used within the db. There are many benefits to this design, and we've used it for years, but recently we realized the design of this system closely matched a graph db. To test this, we recently converted an existing progress db to neo4j, and ended up with millions of nodes and relationships. *Whether or not the graph db like neo4j is necessary is still being determined as we can rapid query our existing progress 4gl db, but the graph does bring a visual aspect, interesting ways to traverse the structure, and works more directly with modern cloud environments.

Whichever db we use, when combined with AI, and some enrichment methods to turn code into english, the outcome is a fairly complete code analyzer that seems like it can rival or beat even extremely experienced devs (30+ years). So to reiterate, there are other similar tools out there like - sourcegraph, tree-sitter, the newer VS code off shoots like Cursor, etc, except with ours, we can extract the key context without AI, and we can extract much more specific code trails than a related list of full methods. This means smaller context windows, less use for AI until needed, fewer mistakes (remember understanding across large context windows), lower costs, and easier access from less knowledgable devs or experienced devs.

Our tool is still in its infancy as we're still learning what's possible, but this is really where we'd love a partner like Progress since we're just one vendor, and the benefit of this would come from many. But, we think it's pretty reasonable to see it being used as a method to replace legacy devs, or it can be used as a method to determine modernization start points - which code could be easily refactored or is repeated, or as a security tool - find holes, or as a full system migration tool to newer versions of openedge or even other langauges entirely. There's even the potential to one day skip the compiler and go straight from the db to UI, which with AI seems like eventually this will be the standard - perhaps just with a layer to manage changes.

Hopefully this makes sense, but feel free to email me if you have any interest - tscott@benedictgroup.com.

TomScott · Jun 6, 2025

TomBascom said:
The easiest way to hunt down the right people and get a conversation going is to attend a PUG.

I was reading about this recently - do you mean like the one in Boston in September? I do feel like that's very far away though - any thoughts on a method sooner? Thanks for the help.

jdpjamesp · Jun 6, 2025

September will be here before you know it, and I believe PSC have done most of their other outreach stuff this year as that tends to be in the earlier half of their fiscal year.

TomBascom · Jun 6, 2025

TomScott said:
I was reading about this recently - do you mean like the one in Boston in September? I do feel like that's very far away though - any thoughts on a method sooner? Thanks for the help.

Yes, the Boston PUG is late September & Early October, Europe is in early November. As James says, time flies, it will be on us before you know it.

FWIW, that all _sounds_ wonderful but so has every fad of the last 75 years of computing. Some concrete and digestable examples sure would be nice.

TomScott · Jun 6, 2025

TomBascom said:
Yes, the Boston PUG is late September & Early October, Europe is in early November. As James says, time flies, it will be on us before you know it.

FWIW, that all _sounds_ wonderful but so has every fad of the last 75 years of computing. Some concrete and digestable examples sure would be nice.

Here is an example, I have a program. It does many calculations. If a problem is reported, I could dump the full program into AI. Often AI can find the issue and point a dev toward the problem, or even suggest and make the change. But, here are two scenarios. In 1 scenario, the dev doesn't know which program the problem is in. Discovering that can be tricky without extensive prior knowledge or amazing documentation. In the 2nd scenario, there is a problem or change request that will extend across multiple programs in a related way. How will a dev learn and know where those are and what changes need to be made. With large programs and the context window limits that exist now, scenario 1 may be solvable with current AI tools (not built for progress), but scenario 2 is rarely solvable by any current AI tool. It's really difficult to talk about these things without writing a novel, but I'll try to check out PUG. There are many other scenarios too. It's an exciting area and I just can't see how it'll be a fad.

tamhas · Jun 7, 2025

Let me add another vote for the PUGgy events. In particular, both at these events and through any other channels you can find, the key goal is to find a champion at Progress. Finding one will make a night and day difference in the attention you can get. A good route might be to develop a talk which shows off your tool. Even better, if you are ready for it, would be to propose a workshop.

Graph-model metaschema for progress codebase - any interest in AI driven code analysis/modernization?

TomScott

New Member

PatrickTingen

Member

jdpjamesp

ProgressTalk.com Moderator

TomBascom

Curmudgeon

TomScott

New Member

TomBascom

Curmudgeon

TomScott

New Member

TomScott

New Member

jdpjamesp

ProgressTalk.com Moderator

TomBascom

Curmudgeon

TomScott

New Member

tamhas

ProgressTalk.com Sponsor