Will Ware's blog: Python

Showing posts with label Python. Show all posts

Thursday, December 26, 2024

Knowledge graphs enhance LLM performance

The AI landscape is evolving rapidly, particularly in how we approach knowledge retrieval for large language models (LLMs). While vector databases have gained significant traction, I'm convinced that knowledge graphs represent a more powerful paradigm for next-generation AI systems.

Why knowledge graphs? They offer structured relationships and semantic context that simple vector similarity can't match. These graphs, refined over decades of development, provide rich interconnections that capture the nuanced relationships between concepts. Projects like DBpedia demonstrate how well-curated knowledge graphs can enhance AI applications with deeper contextual understanding and more precise information retrieval.

I'm also curious about the democratization of LLM application development through new low-code tools. Three platforms have caught my attention:

* LangFlow - Visual programming for LangChain applications
* Flowise - Drag-and-drop UI for building LLM workflows
* ChainForge - Interactive environment for LLM app development

These tools are transforming how we build AI applications, making it possible to create sophisticated LLM-powered solutions through intuitive block diagrams and parameter configuration, generating production-ready Python or JavaScript code.

Tuesday, April 15, 2014

A few decades' progress in computers and electronics

I got my bachelors degree in EE in 1981 and went to work doing board-level design. Most circuit assembly was wire-wrap back then, and we had TTL and CMOS and 8-bit microcontrollers. Most of the part numbers I remember from my early career are still available at places like Digikey. The job of programming the microcontroller or microprocessor on a board frequently fell to the hardware designer. In some ways it was a wonderful time to be an engineer. Systems were trivially simple by modern standards, but they were mysterious to most people so you felt like a genius to be involved with them at all.

After about 15 years of hardware engineering with bits of software development on an as-needed basis, I moved into software engineering. The change was challenging but I intuited that the years to come would see huge innovation in software engineering, and I wanted to be there to see it.

Around that time, the web was a pretty recent phenomenon. People were learning to write static HTML pages and CGI scripts in Perl. One of my big hobby projects around that time was a Java applet. Some people talked about CGI scripts that talked to databases. When you wanted to search the web, you used Alta Vista. At one point I purchased a thick book of the websites known to exist at the time, I kid you not. Since many websites were run by individuals as hobbies, the typical lifespan of any given website was short.

Software development in the 80s and 90s was pretty hit-or-miss. Development schedules were almost completely unpredictable. Bugs were hard to diagnose. The worst bugs were the intermittent ones, things that usually worked but still failed often enough to be unacceptable. Reproducing bugs was tedious, and more than once I remember setting up logging systems to collect data about a failure that occurred during an overnight run. Some of the most annoying bugs involved race conditions and other concurrency issues.

Things are very different now. I've been fortunate to see an insane amount of improvement. These improvements are not accidents or mysteries. They are the results of hard work by thousands of engineers over many years. With several years of hindsight, I can detail with some confidence what we're doing right today that we did wrong in the past.

One simple thing is that we have an enormous body of open source code to draw upon, kernels and web servers and compilers and languages and applications for all kinds of tasks. These can be studied by students everywhere, and anybody can review and improve the code, and with rare exceptions they can be freely used by businesses. Vast new areas of territory open up every few years and are turned to profitable development.

In terms of concurrent programming, we've accumulated a huge amount of wisdom and experience. We know now what patterns work and what patterns fail, and when I forget, I can do a search on Stackoverflow or Google to remind myself. And we now embed that experience into the design of our languages, for instance, message queues as inter-thread communication in JavaScript.

Testing and QA is an area of huge progress over the last 20 years. Ad hoc random manual tests, usually written as an afterthought by the developer, were the norm when I began my career, and many managers frowned upon "excessive" testing that we would now consider barely adequate. Now we have solid widespread expertise about how to write and manage bug reports and organize workflows to resolve them. We have test-driven development and test automation and unit testing and continuous integration. If I check in bad code today, I break the build, suffer a bit of public humiliation, and fix it quickly so my co-workers can get on with their work.

I miss the simplicity of the past, and the feeling of membership in a priesthood, but it's still better to work in a field that can have a real positive impact on human life. In today's work environment that impact is enormously more feasible.

Wednesday, December 11, 2013

ZeroMQ solves important problems

ZeroMQ solves big problems in concurrent programming. It does this by ensuring that state is never shared between threads/processes, and the way it ensures that is by passing messages through queues dressed up as POSIX sockets. You can download ZeroMQ here.

The trouble with concurrency arises when state or resources are shared between multiple execution threads. Even if the shared state is only a single bit, you immediately run into the test-and-set problem. As more state is shared, a profusion of locks grows exponentially. This business of using locks to identify critical sections of code and protect resources has a vast computer science literature, which tells you that it's a hard problem.

Attempted solutions to this problem have included locks, monitors, semaphores, and mutexes. Languages (like Python or Java) have assumed the responsibility of packaging these constructs. But if you've actually attempted to write multithreaded programs, you've seen the nightmare it can be. These things don't scale to more than a few threads, and the human mind is unable to consider all the possible failure modes that can arise.

Perhaps the sanest way to handle concurrency is via shared-nothing message passing. The fact that no state is shared means that we can forget about locks. Threads communicate via queues, and it's not so difficult to build a system of queues that hide their mechanics from the threads that use them. This is exactly what ZeroMQ does, providing bindings for C, Java, Python, and dozens of other languages.

For decades now, programming languages have attempted to provide concurrency libraries with various strengths and weaknesses. Perhaps concurrency should have been identified as a language-neutral concern long ago. If that's the case, then the mere existence of ZeroMQ is progress.

Here are some ZeroMQ working examples. There's also a nice guide online, also available in book form from Amazon and O'Reilly.

Thursday, May 30, 2013

Still plugging away on the FPGA synthesizer

I really bit off a good deal more than I could chew by trying to get that thing running as quickly as I did. A lot of what I'm doing now is going back over minute assumptions about what should work in real hardware, trying to get MyHDL's simulator to agree with Xilinx's ISE simulator (ISE Sim doesn't like datapaths wider than 32 bits) and trying to get the chip to agree with either of them. The chip seems to have a mind of its own. Very annoying.

Anyway I've moved this stuff into its own Github repository so you can show it to your friends and all stand around mocking it without the distraction of other software I've written over the years. So, for as long as it still doesn't work (and with, I hope, the good grace to do it behind my back), y'all can continue with that mocking. Once it actually does what it's supposed to do, all mocking must of course cease.

Saturday, May 18, 2013

My FPGA design skills are a little rustier than I thought

Today I'm going to Makerfaire in the Bay Area. I'd had an idea percolating in my head to use an FPGA to implement fixed-point equivalents of the analog music synthesizer modules of the 1970s, and gave myself a couple of weeks to design and build a simple synthesizer. I'd been a synthesizer enthusiast in high school and college, having attended high school with the late David Hillel Wilson and had many interesting discussions with him about circuit design for synthesizers, a passion he shared with his father. While he taught me what he knew about synthesizers, I taught him what I knew about electronics, and we both benefitted.

Now I have to confess that since my switch to software engineering in the mid-90s, I haven't really done that much with FPGAs, but I've fooled around a couple of times with Xilinx's ISE WebPack software and stumbled across MyHDL, which dovetailed nicely with my long-standing interest in Python. So I ordered a Papilio board and started coding up Python which would be translated into Verilog. My humble efforts appear on Github.

Waveform generator, produces ramp, triangle, and variable-duty-cycle square waves
"Voltage"-controlled amplifier
ADSR envelope generator
My delta-sigma DAC, and a linear interpolator between sound samples to try to reduce aliasing

There was a lot of furious activity over the two weeks before Makerfaire, which I hoped would produce something of interest, and I learned some new things, like about delta-sigma DACs. Being an impatient reader, I designed the delta-sigma DAC myself from scratch, and ended up diverging from how it's usually done. My design maintains a register with an estimate of the capacitor voltage on the RC lowpass driven by the output bit, and updates that register (requiring a multiplier because of the exp(-dt/RC) term) as it supplies bits. It works, but has a failure mode of generating small audible high frequency artifacts particularly when the output voltage is close to minimum or maximum. On the long boring flight out, I had plenty of time to think about that failure mode, and it seems to me the classic delta-sigma design would almost certainly suffer from it too. I think it could be reduced by injecting noise, breaking up the repetitive patterns that appear in the bitstream.

I like Python a lot but I'm not sure I'm going to stay with the MyHDL approach. As I learn a little more about Verilog, it seems like a probably better idea to design directly in Verilog. The language doesn't look that difficult, as I study MyHDL's output, and while books on Verilog tend toward expensive, some of them are more affordable. Those books are on the Kindle, and a couple others are affordable in paper form.

MyHDL-translated designs do not implement Verilog modularity well, and I think it would be good to build up a library of Verilog modules in which I have high confidence. MyHDL's simulation doesn't always completely agree with what the Xilinx chip will do. And while MyHDL.org talks a lot about how great it is to write tests in Python, the Verilog language also provides substantial support for testing. Verilog supports signed integers, but as far I've seen, MyHDL doesn't (this is INCORRECT, please see addendum below), and for the fixed-point math in the synth modules, that alone would have steered me toward straight Verilog a lot sooner had I been aware of it.

It appears the world of Verilog is much bigger and much more interesting than I'd originally thought. I've started to take a look at GPL Cver, a Verilog interpreter that (I think) has debugger-like functions of setting breakpoints and single-stepping your design. I had been thinking about what features I'd put into a Verilog interpreter if I were writing one, and a little googling showed me that such a thing already existed. So I look forward to tinkering with CVer when I get home from Makerfaire.

EDIT: Many thanks to Jan Decaluwe, the developer of MyHDL, for taking the time to personally respond to the challenges I encountered with it. Having had a couple of days to relax after the hustle and bustle of Makerfaire, and get over the disappointment of not getting my little gadget working in time, I can see that I was working in haste and neglected to give MyHDL the full credit it deserves. At the very least it explores territory that is largely uncharted, bringing modern software engineering to the HDL world where (like all those computational chemists still running Fortran code) things have tended to lag behind the times a bit.

In my haste, I neglected the documentation specifically addressing signed arithmetic in MyHDL. I didn't take the time to read the docs carefully. As Jan points out in his writings and in the comment to this blog, MyHDL's approach to signed arithmetic is in fact simpler and more consistent than that of Verilog. What does signed arithmetic look like in MyHDL? It looks like this.

# INCORRECT
>>> x = Signal(intbv(0)[8:])
>>> x.next = -1
Traceback (most recent call last):
...blah blah blah...
ValueError: intbv value -1 < minimum 0

# CORRECT, range is from min to max-1 inclusive
>>> x = Signal(intbv(0, min=-128, max=128))
>>> x.next = -1 # happy as a clam

In the case where MyHDL's behavior appeared to diverge from that of the physical FPGA, my numerically-controlled amplifier circuit above uses one of the hardware multipliers in the XC3S500E, which multiplies two 18-bit unsigned numbers to produce a 36-bit unsigned product. When my music synthesizer was at one point unable to make any sound, I tracked it down to the amplifier circuit, which was working fine in simulation. There was already a hardware multiplier working in the delta-sigma DAC. I poked at things with a scope probe, and scratched my head and studied my code and studied other peoples' code and ultimately determined that I needed to latch the factors in registers just prior to the multiplier. Whether it's exactly that, I still can't say, but finally the amp circuit worked correctly.

I wrongly concluded that it indicated some fault in MyHDL's veracity as a simulator. If it didn't work in the chip, it shouldn't have worked in simulation. But with more careful thought I can see that it's really an idiosyncrasy of the FPGA itself, or perhaps the ISE Webpack software. I would expect to run into the same issue if I'd been writing in Verilog. I might have seen it coming if I'd done post-layout simulation in Webpack, and I should probably look at doing that. Once the bits are actually inside the chip, you can only see the ones that appear on I/O pins.