The Six Five Connected with Diana Blass: Copyright, Deep Fakes, and Data Privacy in the Age of AI

By Patrick Moorhead - July 26, 2023

In a follow-up to the first episode that investigated the race to AI innovation playing out on the world stage, this episode of The Six Five Connected with Diana Blass dives into the copyright violations, data privacy issues and potential deception associated with AI-generated content. On this episode, you will:


Watch the video here:


Or Listen to the full audio here:


Disclaimer: The Six Five webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.


Diana Blass: Food delivery is huge, a market that’s tripled in value since 2017 thanks to the rise of third party delivery apps. Here, diners are lured to a restaurant based on the photos. But what if I told you some of those photos are fake?

Carl Turner: We call it swipe by snapshot. And what it really is, it’s an AI taking menu descriptions, some more information that we are feeding in, and it obviously can be always adjusted, to then curate a full catalog of pictures for each menu item that a restaurant has on the menu

Diana Blass: With generative AI, restaurants skip the pricey food photographer to create realistic images in just seconds. Those behind it call it AI food photography. Critics call it deception.

Daniel Newman: Generative AI can help companies do things like this faster, but I would always argue that the authenticity of the photos and menus tend to leave something to be desired.

Diana Blass: A debate that’s coming to the spotlight as AI increasingly distorts our reality, oftentimes violating copyright and data privacy issues as it does so.

Alexis Keenan: OpenAI faces a growing list of lawsuits, the latest coming from comedian and author, Sarah Silverman. Silverman accusing the artificial intelligence system of copyright infringement.

Diana Blass: Protests, lawsuits, and investigations are emerging as many of us wonder, where can AI learn without violating the rights of others?

Kirk Sigmon: What scares me about that is that you can regulate your way into killing this industry before it even begins.

Diana Blass: This as business opportunities soar.

Satya Nadella: Meta’s Llama 2 coming to both Azure and Windows.

Pat Moorhead: This enables almost anybody to use this.

Diana Blass: A new playing field has emerged, and the question is, are we ready?

Hi everyone. I’m Diana Blass. Welcome back to Connected, brought to you by Six Five Media, the series that connects you to the latest buzz in tech. Our last episode, we investigated AI innovation on the world stage, as we discussed the regulations potentially needed here in the US to keep up. Today we continue the generative AI deep dive, this time exploring the business opportunities, use cases, and the legal conflicts that are emerging as a result. So why not start with the negatives, right? The FTC has sent OpenAI a 20-page letter with questions and demands related to its ChatGPT service. It’s the first official inquiry into generative AI. The department hopes to uncover whether the company engaged in unfair practices related to its data collection methods.

Some of those questions centered around the tendency of the bot to hallucinate or make up responses when it doesn’t know the answer. There’s also the fear that it’ll share sensitive or false information about a person or a company as part of its responses. It comes as comedian Sarah Silverman files a lawsuit against OpenAI and Meta, alleging that they use a pirated copy of her book to train their models. It’s one of many lawsuits of its kind as questions around copyright consent and compensation get louder with the growth of generative AI. Our first guest doesn’t want to say he told you so, but Mr. Pat Moorhead, you kind of called it.

Pat Moorhead: I like to look at this in the same way that we looked at search when it rose up in the late nineties. And in fact, I actually worked at Alta Vista. That was the number one search engine before Google came along. And we were having these same discussions, which was, “How can you come in and Hoover all my data and then you can monetize it and make money off of it? What’s in it for me?” And the back and forth on that was that Alta Vista and Google would send links back and search replaced this Yahoo directory method, which was go through and click on the link that you want in automotive and find what you’re looking for.

And we’re experiencing the same thing now. The quid pro quo that’s not understood is what is the value of coming in, and you took all my information? Any model that trained on data that wasn’t public could very much be in trouble. But just like we saw, I can see that companies are going to have to cut deals with content providers to provide them some value, and this is all going to go away.

Diana Blass: Let’s learn more as to how these legal fights could unfold. Let’s turn to Kirk Sigmon, IP litigator and patent prosecutor with Banner & Witcoff in Washington, DC.

When the FTC inquiry specifically came down last week, did it surprise you at all?

Kirk Sigmon: To some degree, because it’s predicated a little bit on an interesting perspective, at least with respect to OpenAI’s operations. So a lot of this request that they sent over to OpenAI asks questions about stuff that others have noted is a little bit strange given the context of what the FTC normally does. They seem principally concerned with, and I’ll read from the document here, personal information, but other stuff like defamatory information, anonymizing information. It’s very interesting because as others have noted, typically defamation and other topics like that are handled by state courts and yet the FTC seems very concerned about this. Now, that might be giving this short shrift. They’re also very concerned with the recent data breach that occurred with respect to OpenAI work. For example, you could see other people’s search histories. But it does seem as if they’re very concerned about the output of the language learning model and particularly whether or not it could be insulting or harmful or hateful or whatever.

It’s an interesting perspective because it’s something that, at least in my view, is a little bit novel for them. The idea that they’re providing a product that is, as far as I can tell, fairly well documented that it’s not perfect. It’s a computer program effectively, but they’re concerned with the output of it that might ultimately be used by end users. It’s not something I’ve seen before. It’s a novel approach. Of course, this investigation doesn’t promise they’re going to go after anything, and it certainly doesn’t promise that they have a specific legal theory in mind, but it’s a little surprising. It’s not something I’ve seen before.

Diana Blass: Obviously there are so many changes that we have yet to see and I’m sure regulations and restrictions will come down, but it will likely take years for that to happen. So how do you see the current lawsuit by Sarah Silverman and others unfolding right now because are the laws even in their favor?

Kirk Sigmon: So the Sarah Silverman case is very interesting and we really have to be very precise about what her argument is because it has some interesting dimensions, but it also has ones that are I think weaker. So just to recap very briefly, she’s effectively out alleging that Meta and OpenAI use her book as part of training and that the resulting models themselves are scarily good at replicating her book. It can summarize it very, very well, suggesting it had access to that book at one point. And specifically I mentioned earlier, but I’ll mention it again, part of the allegations suggests that they may have pulled from that Z library, that semi-legal source of books.

The concern with that is that there are cases out there already saying pretty definitively that fair use does cover use of copywriting content when it’s transformative, when it’s de minimis, and an example of this would be Google Images one time was a receipt of a lawsuit from a company called Perfect Ten over thumbnails that they used in Google Images. The ability to see a thumbnail, it’s a thumbnail of copyright content, and that was a massive battle that Google was ultimately victorious of. To that end, I would say that there is a genuine concern that her claims assume a level of protection for copyright that does not exist. The use of her book in this very attenuated way to train nodes of an artificial neural network in a very mathematical way, it’s very attenuated. It’s not as if there’s some copy of her book stored on an OpenAI server, although that may be the case.

I think what’s more interesting though is that her allegations revolve around the idea that it’s unusually specific about her book. In some cases, that could be because it read a lot of reviews of her book. On the other hand, it could suggest that OpenAI is pulling from a database that includes her book, which would be a much more interesting argument. But at the end of the day, the bigger issue here is the fact that her allegations, and if the class is certified, it could kill AI because one concern with artificial intelligence in general, one concern with machine learning in general, is that you need a lot of data. For example, if you take classes in AI, in ML in schools, you have to deal with terabytes and terabytes of data just for small student projects. Needless to say, OpenAI and ChatGPT have to use quite a substantial, much more amount of data to do what they do.

And if you tell them, “Well, you have to go get the license for every single bit of content you use,” my God, when I was studying it back in the day to do a dog recognition algorithm, I think I used 10,000 images. I would’ve spent more time writing checks to various photographers than I would have actually doing the code. And God forbid that would kill OpenAI, because I’m sure they’re using petabytes of data to do this. So it’s an interesting one because it does touch on copyright in the extent of which it’s protected. I do think this is basically a fair use argument that OpenAI is going to have to make.

Diana Blass: Great. Thanks, Kirk, for your insights. To learn more about how businesses are incorporating generative AI platforms into their services and navigating the legal pitfalls, let’s turn to our company’s profile in today’s episode. This is Swipeby, a software company that specializes in turning any brick and mortar retail shop into a connected smart store. Originally, Swipeby offered businesses, particularly restaurants, a way to manage pickup and deliveries with an integrated software platform. But now with AI, its services have massively increased to include offerings like marketing, price optimization, and photography. And here’s what’s interesting. When I conducted my interview with Swipeby founder Carl Turner about three or four months ago, many of those services didn’t exist yet, so it’s fascinating to see just how quickly AI has transformed his business and his clients too.

Today’s interview focuses on the AI food photography service that leans into generative AI to turn any food description into a photo. It was controversial when it launched, with some food delivery apps and other critics slamming it as deception that promotes fake food, since diners may not realize that the images they see aren’t showing the actual dish. But here’s the thing, restaurants that add pictures to their menus receive upwards of 70% more orders and 65% higher sales compared to the restaurants that don’t. Let’s learn more. Here’s my interview with Turner.

Carl Turner: We call it swipe by snapshot. And what it really is, it’s AI taking menu descriptions, some more information that we are feeding in, and obviously can be always adjusted, to then curate a full catalog of pictures for each menu item that a restaurant has on the menu.

Kirk Sigmon: How did you go about building this solution into your technology stack? Are you using ChatGPT or is this another similar type of algorithm?

Carl Turner: So we do use technology from OpenAI. We also use some from a company called Stability AI. So we use their underground models but then tightly integrated into our solution. So what that means is, yes, we have not built ourself a stable diffusion model. We are applying AI or the AI models that are out there in the market. But again, I think the key point is how tightly integrated this is. It’s one button pressed, image is there. If you don’t like it, regenerate it, do a little bit of adjustments. But generally, normally first try is the charm. It’s really good on first try if the description is somewhat complete.

Diana Blass: So what do you say to the critics out there who feel you’re promoting the rise of fake food photos, potentially deceiving consumers or customers about what the food really looks like?

Carl Turner: Yeah, I would have a couple of points. I would say, number one, valid concern. Certainly. You don’t want to put something on there that is not real. I would say two other things though, and the one thing is if we look again at the big guys, tell me one fast food restaurant where that picture is food photography, actual food photography, yes it is, but it has been styled afterwards. There is ingredients that are not even in it. So why would we hold accountable, again, the little guy, the SMB restaurant, the one to 10-location restaurant owner versus the, I don’t know, 200,000 chain restaurants that day to day are putting pictures out that are, in the end, like AI-generated?

Diana Blass: And just to give more perspective into how much of a game-changer this type of a solution is, can you get feedback and how long does it take to generate these images? What kind of information do you really need to provide? Just really pinpoint how this isn’t time-consuming.

Carl Turner: Yeah. No, I think it’s an absolute game-changer. And let’s start with, so today, there is a reason why the majority of our customers 80, 90%, don’t have full menu pictures. Maybe there is two, three items, but that 50% of items or more have pictures is virtually not existing. And the reason is not because they didn’t upload pictures or they didn’t share pictures was up to upload, or they didn’t upload them in the POS that we’re thinking. The reality is they just don’t have it. And why don’t they have it? Because they need to make all the food. They need to pay a photographer to come in. And for a lot of them, they just don’t have the ROI on that because they need the five pictures for their printed menu to look cool, then two for some social media posts and for their website, and that’s what they have.

With AI now, the first thing is one of the hardest part about I think picture generation and AI, the time-consuming part is writing up the content. The content is there because we have a full online menu with descriptions for the consumer right now. And to generate an image is seconds, if not milliseconds. So I think generally if it’s a very complex description, it could go up to two seconds per image. But what we are talking here, most restaurants’ menu of approximately 50 items or something like that, it will be less than five minutes.

Diana Blass: Okay, interesting. So looking ahead, last question here, how do you foresee yourself further integrating ChatGPT into your software?

Carl Turner: Yeah, so we are looking again that as an online ordering platform where we have customer data, ordering trends, we have specials, we have loyalty, we have discounts, we have the pictures, we have the descriptions. That gives us really everything we need to create really great engaging social media content. And so how that could look like is really is that the restaurant owner can say, “I want to post twice a week.” Now, twice a week they get a text message with an AI-generated image and some copy, which is super relevant to that day because we know that day is where they have Taco Tuesday or that day is where the kids’ items are 5% off because that’s set in our system, and then they can make a unique post. And the only thing that they need to reply is yes or no. So it’s a fully still control to them, but a fully automated system.

And I think a lot around customer engagement. So when we think about, it’s really important to reply to your reviews online, on Yelp, on Google. Algorithms talk with algorithms, how do restaurants get ranked? Obviously the number of reviews and how good they are, but also do you reply to them? And this ranking algorithm from any of the big review sites does not really know who replied here as long as someone replied. And as we have seen with ChatGPT, a system that can pass the bar exam and that can pass international medical exams will likely be able to say, “Thank you for your five star review. Please come back.” So that’s covered a lot there, but this is the features that we are thinking and we are thinking here in the next six months.

Diana Blass: All right. Well, big thanks to Carl Turner for his perspective there, and also interesting to point out that it took way less than six months to get those services he mentioned at the end there off the ground, so crazy, right? Now, as you heard in that interview, Swipeby built its model to generate these images using Stability AI and OpenAI, two companies now in the spotlight over their data collection methods. OpenAI, as we mentioned at the top of the show, is under investigation by the FTC, and Stability AI is currently fighting a lawsuit by the stock source Getty Images, who claims the company illegally trained its models on its photo library.

Turner mentioned that Swipeby has access to its own pool of data from work with other clients, so I’m not trying to tie them to these implications, but it certainly underscores the risks as we consider how regulators will look at a company’s use of AI. And adoption in the commercial space is only expected to rise following news from Meta, who will make its open source AI model Llama 2 available for free to businesses and researchers. Let’s bring back Six Five’s Pat Moorhead for insights.

Pat Moorhead: So first off, I think it would help to even define what these models are. Think of the model as the brain of generative AI, and that brain is trained against a certain kind of data, and Llama 2 is one of these large language models. Billions of parameters, and that means basically variables, and it’s trained on a very diverse set of data. Think of it as it was trained on the internet. And models can be open and models can be closed. Open models can be used with, first of all, they’re free and you can use them with certain restrictions. And closed models are exactly as the name says, it’s closed. So what I expect this to do is democratize large language models because it can cost up to $100 million to train one of these large language models, which puts it out of reach of pretty much anybody except for maybe the top 50 organizations on the planet.

Diana Blass: Isn’t there some controversy though associated with open source? I thought Elon Musk and a lot of critics against the technology in a way, they paint it as unleashing the devil when it comes to open source. I guess you can’t control it as much.

Pat Moorhead: Well, it’s interesting. I think it supports the democratization, but the challenge is that you also get it into the bad people’s hands. Just as easily as a small business could come around and use this thing, a hacker or hacking as a service folks around the world could use this. And I think, like chips, there will be export controls on certain models that come out of the US, who can use them, but it’s not something like a nuclear reactor or something like that. It’s software that could be moved around over an FTP. It can be shared on a thumb drive. So there is a risk to it, just like Elon Musk has talked about.

Diana Blass: And the incentive for Meta, what’s the incentive for them to do it for free?

Pat Moorhead: Yeah, so the incentive, it goes back probably almost 10 years to Facebook’s strategy when it came to infrastructure. Facebook realized when it was a small company, it couldn’t operate at the scale of let’s say a Google or a Microsoft. And then when I’m talking infrastructure, I’m talking server storage networking. So they helped create a standard called OCP, the Open Compute Project, which got all of these people in to co-develop infrastructure. And not only would Facebook, now Meta, use it, but even some banks would use this infrastructure. And then they used a low level framework called PyTorch they invented. What that does is it creates a market for development and developers, so Meta doesn’t have to foot the entire bill. And what Meta wants to happen, and they’re not going to use these words, trust me, but they want more people to be working on Llama, to create more developers out there, to lower the cost for them and increase their capabilities that they can use inside of Meta.

Diana Blass: Okay. Well, thanks, Pat, for your insight. It’s clear we’re seeing what you could describe as two worlds emerging within generative AI. One that’s open and another that’s closed, each differing in their approach to data collection and applications, but together they’re sure to have massive impacts. So how can we leverage the strength of both open source and closed source models to get there? It’s a question we’ll continue to investigate on Connected, but next week I’m excited to introduce you to a city behind the United States’ first quantum network. What will this technology do for the city that’s become known as a Silicon Valley of the South? Be sure to follow along so you can stay connected to that story and others. Till next time, I’m Diana Blass.

Patrick Moorhead
+ posts

Patrick founded the firm based on his real-world world technology experiences with the understanding of what he wasn’t getting from analysts and consultants. Ten years later, Patrick is ranked #1 among technology industry analysts in terms of “power” (ARInsights)  in “press citations” (Apollo Research). Moorhead is a contributor at Forbes and frequently appears on CNBC. He is a broad-based analyst covering a wide variety of topics including the cloud, enterprise SaaS, collaboration, client computing, and semiconductors. He has 30 years of experience including 15 years of executive experience at high tech companies (NCR, AT&T, Compaq, now HP, and AMD) leading strategy, product management, product marketing, and corporate marketing, including three industry board appointments.