AI Trends for 2024: What’s Old is New Again

There’s a lot of shock and awe in the AI space these days. And lots of money on the table. But through the sturm und drang, some trends are emerging. To level-set, I went back and reread our last published article together, Moving Forward in 2020: Technology Investment in ML, AI, and Big Data (William F Jolitz & Lynne G Jolitz, Cutter Business Journal, 7 April 2020).

Four years ago, AI was at a crossroads. When we looked a traditional value propositions in technology, where one went from a specific technology to a target customer in a high value sector to a broadened sector and use, AI was doing miserably. 70% of companies said their AI projects provided little to no benefit to their company. Only 40% of companies said they had made a significant investment in AI. The frustration lay with “products sold with ill-defined benefits” which led to “unsustainable revenue that plummets when customers become disillusioned from a tactical lack of sales focus”. We stated the key problem was “the startup’s sales focus no longer aligns with the customer’s strategic focus”.  In tech speak, they couldn’t figure out what to do with it and got disappointed.

We suggested what we called an “axiomatic” approach: “Instead of moving from technology to key customers with an abstracted TAM (Total Available Market), we must instead quantify AI and ML benefits where they specifically fit within business strategies across segment industries”. We then highlighted three areas to watch: surveillance, entertainment, and whitespace, while also discussing the issues with ad hoc architectures which potentially disrupt the cloud services costs and security. In terms of architectures, there is now more focus on data ownership and control, as well as reducing costs in the cloud. But it’s still very much the same as four years ago for most customers.

But the key prediction where we were literally “on the money” was our analysis of chaotic disruption of the market forwarded and funded by “super angels”. This was how companies like OpenAI spawned and spurred tremendous disruption in a very short timeframe: 

“Venture capital (VC) investments in ML/AI fixate on a startup’s ability to obtain go-to-market sales by disintermediating other vendors and to lock-up highly profitable (yet elusive) opportunities. The VC’s intent is startup validation and gauging threats to other vendors’ uncompetitive businesses that will drive the startup’s ability to gain partnerships and revenue shares. However, sometimes, the result is not what VCs would wholly desire but rather more like paralysis with no clear “win” — because the startup only partially engages the customers and does not succeed in displacing other vendors. To force the win, tactical deconstructing/reconstructing of AI/ML solutions around existing layers of edge and cloud platforms as an investment category is akin to desperately reshuffling poker chips on the poker table. This is best avoided. Industry disruption is inherently unstable. Like an ouroboros, it can abruptly turn from obvious low-hanging fruit targets to feeding off earlier successful targets undergoing a state of change.  

The potential for radically greater opportunities is more interesting than patiently maintaining course or  re-navigating the rough waters to see existing ventures through to a reasonable conclusion. This potential is the realm of super angels, self-funders, and leading edge “winners.” These individuals and groups see no disadvantage to riding a chaotic wave because they’ve gotten accustomed to being so out in front of theirthe self-competition within their newly chosen, ever-shifting “whitespace path.” 

However, the traditional VC process is disadvantaged by these groups because venture capitalists’ gut instincts based on the feel of the deal get whipsawed by the loss of bragging rights to ROI, limiting them from getting too far out beyond their headlights. Thus, chaotic disruption is a no-go zone for most. For those who decide to enter these perilous waters, the tendency to share risk across many partners leads to a kind of groupthink at odds with the fast moves and flexibility required of the super angels. 

As open source investments demonstrated, it’s a risky business consuming your own potential customers. In the AI chaotic disruption, all potential customers are considered targets: media, artists, writers, businesses.

In consuming the “long tail” of literature, art, whitepapers, business databases, and personal information and opinion on the Internet and then regurgitating it as facsimiles stripped of authorship and authority, companies like OpenAI and Google whipsawed established players. As we have seen, the rush of businesses and consumers to magnify this effect was phenomenal — and dangerous.

The intent was to rapidly drive paniced companies to sign exclusive agreements and become the dominant company in AI for the next half century. If it sound unbelievable, note we now have only a few companies which dominate search, content, and connection due to brand recognition and addictive use. It takes a lot of money to maintain an addition, or establish a new one. 

As the investment space is still suppressed due to poor conditions despite all the dry powder, there are a few bright spots. Battery investments continue to spark interest. Climate change companies surge and storm. Crypto was actually legalized by the SEC, because you can’t play with GameStop forever — so it’s time to jump into the big scams, kids. Space investments are, well, vast. AI plays a role in all of these.

But because of the chaotic disruption strategy that our billionaires strategized in Silicon Valley, AI now has the attention of everyone, from governments and military and NGOs to plain ordinary users. It doesn’t matter if AI is “lazy”. Even the IMF is jumping in.

Will AI benefit humanity? That’s out of my paygrade. William and I saw it had unique potential in many areas in 2020. That’s still true in 2024. I hope the chaotic disruption doesn’t prevent us from seeing some real benefits for the better.

Fun Friday: The Race for AI Creative Works Control

In April of 2020, William and I wrote in the Cutter Business Journal an article entitled Moving Forward in 2020: Technology Investment in ML, AI, and Big Data. We focused on three areas: surveillance (monetization), entertainment (stickiness), and whitespace opportunities (climate, energy, transportation). This statement bears emphasis:

Instead of moving from technology to key customers with an abstracted total addressable market (TAM), we must instead quantify artificial intelligence (AI) and machine learning (ML) benefits where they specifically fit within business strategies across segment industries. By using axiomatic impacts, the fuzziness of how to incorporate AI, ML, and big data into an industry can be used as a check on traditional investment assumptions.

[For additional information on this article, please see AI, ML, and Big Data: Functional Groups That Catch the Investor’s Eye (6 May 2020, Cutter Business Technology Advisor).]

But one might be puzzled as to where generative AI tools such as ChatGPT or Dall-E fit in the AI landscape and why we should care about AI art, AI news and press releases, AI homework and essays — even threatened AI music like what’s talked about in 1984 by Orwell.

The reality is these areas utilize easily crawled content available everywhere lying around in the Internet attic. It also takes tremendous computing power to conduct ML and process this data effectively into some kind of appearance of sensible output. Hence, these tools will remain in the corporate hands of the creators no matter what they claim about “open source” — it’s simply too difficult for anyone but a giant corporate entity to support the huge costs involved. So this is about monetization and stickiness. Large companies are willing and able to pay the cloud costs if the customer gets dependent on using their tools. Flashback to the tool-centric sell of the 1980s, Silicon Valley style. All we need is an AI version of Dr. Dobbs Journal and we’re all set.

Previous attempts at generative AI have usually focused on small ML datasets, leading to laughable and biased results. Now companies are looking at the shift at Google in particular from ads to AI, along with Microsoft and FaceBook. Everyone believes they are in a race and frantically trying to catch up before all that sweet sweet money is locked up by one of them.

But is there really any need to “catch up”? Is this a real trend, or just an illusion? Google made its fortune on categorizing every web page on the Internet. It had plenty of rivals back then. I was fond of Altavista myself. But there was also everything from Ask Jeeves to Yahoo. 

Now Google and Microsoft are analyzing the contents of these big pots of data with ML But it’s not just for analyzing. It’s for creating content. Music. Art. News. Opinion. And you need an awful lot of processing power to handle all that data. So it’s now a Big Guy Game.

One of the approaches to eliminating bias is to use ML to process more and more data. The bigger the data pot, the less the bias and error. Well, that’s the assumption, anyway. But it’s a dubious assumption given the pots analyzed are often variations on the theme. Most search categorization is based on recent pages and not deep page analysis. Google is no Linkpendium.

All this, oddly, reminds me a bit of the UK mad cow fiasco, where their agricultural industry essentially cultivated prions by feeding animals dead animals. Like Curie purified radium from pitchblend, the animals who died of this disease were then processed and fed to other animals. And since prions, like radium, persist after processing, the prions were concentrated and made it up the food chain into humans.

So in like kind the tools themselves are feeding back into the ML feedlot and being consumed again. It may take longer than a few days, but we will be back to the same problems in terms of bias and error.

However, the gimmick of having a “machine” write your essay or news blurb is very tempting. Heck, AIs are claimed to take medical exams or pass a law class or handle software programming better than people.

But being a doctor or attorney or a software engineer is much more than book learning, as anyone who’s done the job will tell you.

And of course, there is now backlash from various groups who value their creative works and are not interested in rapidly generated pale imitations polluting their space and pushing them out. They didn’t consent to have their works pulled into a training set and used by anyone. Imitation is neither sincere or flattering, and is even legally actionable if protected by copyright, trade secret, or patent. It’s not “fair use” when you suck it all in and paraphrase it slightly differently.

This isn’t new. William and I ran into this in the old days with our 386BSD code. We were happy to let people use it and modify it — what is code if not usable and modifiable? But we asked that provinance be maintained in the copyright itself by leaving the names of the authors. And we had entire modules of code that were written denovo in the days when kernel design meant new ideas. It was an amazing creative time for us.

So I remember how shocked I was when an engineer at Oracle asked me about tfork() and threading, since a Linux book he had talked about it but he could find nothing in the Linux source code. I pulled out our 386BSD Kernel book and showed him that it was novel work done for 386BSD and would not work in Linux. Upon discussion it turned out that book just “paraphrased” many of our chapters without even considering that Linux did not incorporate much of our work because it was a very different architectural design. It misled software designers — but I’m sure it sold a heck of a lot more books than we did by turning “386bsd” to “linux”. So it is today, but a heck of a lot easier for the talentless, the craven, and the criminal to steal.

Now many software designers are upset because their source code depositories are used as the models for automated coding, and they don’t like that one bit. And I don’t blame them.

We lived it. And it was a primary reason why the 386BSD project was terminated. Too many trolls and opportunists ready to take any new work and paraphrase it. So get ready to see this happen again in music, art, news, and yes, software. The age of mediocrity is upon us.

1984, here we come…

Fun Friday: AI Technology Investments, Failed Startups, 386BSD and the Open Source Lifestyle and Other Oddities of 2020

First, William Jolitz and I did a comprehensive article entitled Moving Forward in 2020: Technology Investment in ML, AI, and Big Data for Cutter Business Journal (April 2020 – paid subscription). Given the pandemic and upheaval in global economies, this advice is even more pertinent today. 

Instead of moving from technology to key customers with an abstracted total addressable market (TAM), we must instead quantify artificial intelligence (AI) and machine learning (ML) benefits where they specifically fit within business strategies across segment industries. By using axiomatic impacts, the fuzziness of how to incorporate AI, ML, and big data into an industry can be used as a check on traditional investment assumptions.

For additional information on this article, please see AI, ML, and Big Data: Functional Groups That Catch the Investor’s Eye (6 May 2020, Cutter Business Technology Advisor).

Techcrunch presented their loser brigade list of 2020 failed startups in December of 2020 – although a few more might have missed the list by days. Some of these investments were victims of “the right startup in the wrong time”. Others were “the wrong startup in the right time”. And some startups were just plain “the wrong startup – period”. 

We mourn the $2.45 billion dollars which vanished into the eager pockets of dreamers and fools (we’re looking at you, Quibi – the pig that swallowed $1.75B of investment and couldn’t get any customers) and feel deeply for the Limiteds who lost money in one of the biggest uptick years in the stock market.

Thirty years have passed since we launched open source operating systems with 386BSD. Open source as a concept has been around for over 40 years, as demonstrated by the amazing GNU GCC compiler done by RMS. But until the mid-1990’s, most software was still held under proprietary license – especially the operating system itself. The release of 386BSD spurred the creation of other progeny open source OS systems and a plethora of open source tools, applications and languages that are standard today. However, the “business” of open source is still much misunderstood, as Wired notes in The Few, the Tired, the Open Source Coders”. Some of the more precious gems excerpted:

But open source success, Thornton quickly found, has a dark side. He felt inundated. Countless people wrote him and Otto every week with bug reports, demands for new features, questions, praise. Thornton would finish his day job and then spend four or five hours every night frantically working on Bootstrap—managing queries, writing new code. “I couldn’t grab dinner with someone after work,” he says, because he felt like he’d be letting users down: I shouldn’t be out enjoying myself. I should be working on Bootstrap!

“The feeling that I had was guilt,” he says. He kept at it, and nine years later he and Otto are still heading up Bootstrap, along with a small group of core contributors. But the stress has been bad enough that he often thought of bailing.”…

…Why didn’t the barn-raising model pan out? As Eghbal notes, it’s partly that the random folks who pitch in make only very small contributions, like fixing a bug. Making and remaking code requires a lot of high-level synthesis—which, as it turns out, is hard to break into little pieces. It lives best in the heads of a small number of people.

Yet those poor top-level coders still need to respond to the smaller contributions (to say nothing of requests for help or reams of abuse). Their burdens, Eghbal realized, felt like those of YouTubers or Instagram influencers who feel overwhelmed by their ardent fan bases—but without the huge, ad-based remuneration.

Been there. Done that.

Not many Linux-come-latelies know this, but Linux was actually the second open-source Unix-based operating system for personal computers to be distributed over the Internet. The first was 386BSD, which was put together by an extraordinary couple named Bill and Lynne Jolitz. In a 1993 interview with Meta magazine, Linus Torvalds himself name-checked their O.S. “If 386BSD had been available when I started on Linux,” he said, “Linux would probably never have happened.”

Linus was able to benefit from our two year article series in Dr. Dobbs Journal (the premiere coding magazine of the day, now defunct in an age of github), which along with the how-to details of “Porting Unix to the 386” we also included source code in each article. That, coupled with Lions Commentary on Unix (NB – the old encumbered Edition 6 version, and not Berkeley Unix) allowed Linus to cudgel together Linux. We had no such issues, as we had access to both Berkeley Unix and a source code license from AT&T for our prior company, Symmetric Computer Systems, and hence knew what was encumbered and what was not (Lions was entirely proprietary). Putting together an OS is a group effort to the max. Making an open source OS requires fortitude and knowledge above and beyond that.

Jalopnik, one of my favorite sites, found the ultimate absurd Figure 1 patents with this little gem of an article: Toyota’s Robocars Will Wash Themselves Because We Can’t Be Trusted. Wow, they really knocked themselves out doing their Figure 1, didn’t they? Womp Womp.

And finally, for a serious and detailed discussion of how the pandemic impacted the medical diagnostic side, I recommend this from UCSF: We Thought it was just a Respiratory Virus. We were Wrong (Summer 2020). Looking back, it was just the beginning of wisdom.

Stay safe, everyone!

Yes Virginia, Neutrinos Do Have a Bounded Mass (Thanks to Big Data)

Getty Images.

Many years ago, Jim Gray was conducting a talk at Stanford I attended, whereby he outlined the challenges in processing the huge datasets accumulated in scientific fields like astronomy, cosmology and medicine.

In those days, the greatest concerns were: 1) cleaning the data sets and 2) transporting the data sets. The processing of these data sets, surprisingly, was of little concern. Data manipulation was processor-limited and modeling tools were few. Hence, success was dependent on the skill of the researchers to delve through the results for meaning.

Jim lived in a world of specialized expensive hardware platforms for stylized processing, painstaking manual cleaning of data, and elaborate databases to manipulate and store information. As such, large academic projects were beholden to the generosity of a few large corporations. This, to say the least, meant that any research project requiring large resources would likely languish.

In the decades since Jim first broached the huge data set problem (and twelve years after his passing), the open source disruption that started with operating systems (of which I was a part) and new languages spawned in turn the creation of data tools, processing technologies and methods that Jim, a corporate enterprise technologist, could not have imagined. Beginning with open source projects like Hadoop and Spark (originally from UC Berkeley, just like 386BSD), on demand databases and tools can provide (relatively speaking) economical and efficient capabilities. And one of the biggest of big data projects ever recently demonstrated that success.

Continue reading Yes Virginia, Neutrinos Do Have a Bounded Mass (Thanks to Big Data)