Does AI Scream at Electric Creeps?

Kathleen McKiernan
The Startup
Published in
7 min readJul 5, 2020

--

A look into artificial intelligence and the machine learning models that encompass the foundation of much of our user generated content

When most people think of machine learning in relation to themselves, something like the auto-correct peppered throughout their texts might come to mind. But these technologies are integrated into so many industries that touch us daily. In my previous article linked below, I talk about the broad strokes of machine learning by looking into the technologies of self driving cars, healthcare, and briefly touched on the YouTube algorithm. In this article, I’ll be diving farther into that last concept by approaching three different violations of terms and services on a social media platform and the role that machine learning has in mitigating any hardships caused by these violations.

To fully understand the decision making behavior, we must go over the basics of these algorithms. Machine learning is constructed on the foundation of three different styles of leaning: Supervised, Unsupervised and Reinforcement Learning. Supervised learning is based on working the algorithm according to a given set of features. This relies on human generated guidance, but also requires found data to learn. Secondly, unsupervised learning is based in found data, but it relies on machine created guidance. This learning style has no given features and the algorithm has to assign features to unlabeled data based on its previous decisions. Lastly we have reinforcement learning, which relies entirely on its given feedback. By rewarding the algorithm through a marriage of human and machine created guidance, systems can refine their algorithms for maximum efficiency. This is all done in a simple flow of input, then utilization of a model to illustrate the way the material is analyzed, and then to output. If the material is incorrect, the machine goes back into the model to correct itself, then again to output. This is done until the correct decision is made. Below I’ve included an amazing video by Simplilearn that visualizes these concepts wonderfully.

Machine Learning Basics by Simplilearn of YouTube

In this first violation, we look at what happens when someone uploads copyrighted content. In the current evolution of large social media domains founded on user generated content, there are many safeguards that allow from duplication of content from large media conglomerates. Take the Marvel movies for example, after each production of the box office hit, a blueprint is made of the movie to match against content being uploaded. This blueprint will flag any material that is deemed a fraudulent upload and the file will most likely not complete the upload. This is an example of supervised learning, since the algorithm knows what it is looking out for. In some cases such as lesser known media, the file could complete the upload. It could even stay up for a few hours, but eventually it will most likely be taken down either by way of machine learning algorithms or by a user flagging the content. Personally I can attest to a recent pandemic related matter where my acquaintance wanted to create a viewing party for friends. Even with choosing a certain obscure folk horror film from 1973 running only eighty-seven minutes, the viewing party lasted a brief forty-five before the algorithm flagged the content and removed it.

Example of a video and its corresponding sound waves

Now since these blueprints have proven themselves to work, it’s time to take a look at what goes into them. Firstly we can look at the importance of a file’s audio. When looking at content in an editing software, such as the example above, one can see the mapping of sound waves alongside its corresponding visual frames. This shows peaks and valleys, all related to its corresponding action and its visual representation of that. To simply put it, each sound has a way it is expressed in sound waves. Whether it’s the sound of someone’s coffee mug setting down on a hardwood table, or the sound of artillery firing across a field, these noises have a visual representation to coincide. This is why some bootleg copies of films can be seen with slightly distorted audio, or the visual frames cropped or vignetted with a graphic to try to trick the algorithms into overlooking this copyright violation. In the below article from TechCrunch, that exact thing is discussed. These failsafes are only viable on whole uploads in their original standing, so in the case of someone willing to edit the video, or share it in pieces, the algorithm might not be able to effectively remove the content.

Now in the scenario of re-uploading content from a media conglomerate, repercussions can be varied. Mostly people will receive a notice, maybe get a tick on their account, but in the case of a repeat offender, one could face legal ramifications. One thing is worth noting though, and that is the pervasive issue of the theft of individuals media, whether it be malicious or “for the views.” In this scenario, an individual chooses to re-upload content to their social media but they don’t tag the original poster. Now some might try to go through the report through the app function, others may reach out through email or social media, but the running theme in this scenario is that the outcome is usually the same: nothing. With the current way some platforms are structured, their machine learning algorithms and human intervention are coming under fire. Some criticize that these algorithms aren’t for the people, but actually to uplift corporations and sponsored posts. But what would this even look like in the machine learning mindset? Well this is where the “learning” part comes in. In the previous example, the violation was found by the blueprint given to the system. In this, maybe a violation of sorts was detected, but maybe the content flagged is gaining a lot of traction to the site. One could argue that by way of human intervention, the algorithms can begin to favor the corporate funded media significantly over the user with 1,000 subscribers that you’re actually subscribed to yourself. Although shown here in a somewhat negative view, this is an example of reinforcement learning. By human intervention guiding the way the machine learns, those previously validated or corrected choices now determine the way the algorithm reacts to similar content uploaded onto the platform.

Machine learning in YouTube with their CEO Susan Wojcicki

Now with this mindset of a heavily favored algorithm, the problem can become even darker when the issue of the theft and distribution of malicious media comes into play. In this scenario, something is uploaded to bully or harass a user. By its very nature, it most likely falls against the community terms of service. Although people often don’t get the media taken down, some do, and if the issue is widespread enough, a blueprint could be made of that particular media to screen against re-uploads. But the issue in this scenario is usually how long something takes to be flagged, even when a blueprint is created. In some of the worst cases media can be up for days, even years, garnering thousands of views and downloads alike, all while at the disappointment and anxiety of the person who went through all the trouble to try and avoid these re-uploads. This can result in a myriad of negative situations, pushing people to their wits end. As seen below with a simple Google search of “remove negative online content” the results and resources are staggering. From a momentary glance, one can see just how dangerous some things can get for individuals being targeted in such a heinous way.

Lastly in a much more lighthearted scenario, we look at plain old user error. Maybe someone accidentally posts a scandalous photo or they don’t realize something is in the background, the algorithm can use previous decisions to assign a label to something and deem it against terms of service. In this example we take a look at the odd but existing problem of people somehow posting their personal information online. In one example, a credit card is in plain sight, in another a medical ID bracelet. All of these are ways someone could seriously harm someone or themselves, but posted completely accidentally, sandwiched in between a string of normal posts. The algorithm could see these key factors such as the syntax of 16 digits organized in chunks of four and recognize it as someones personal information and remove the post. In another scenario, the scandalous photo accidentally posted goes through that same checking system. By checking against basic mathematics of the photograph and previously made choices, the algorithm categorizes the contents and determines whether or not its deemed appropriate. In these scenarios, unsupervised learning is most likely what would be used. This is because the content is being checked without a blueprint of what to expect. The algorithm is what is responsible for finding a category for everything based on its own machine based learning.

Some decade-old internet gold

No matter what the scenario, a simple flow of information is happening. From input, the material goes into the given model, then to output. This deceptively simple flow if information only begins to encompass the possibilities machine learning offers. In such a fast growing world, it is hard to even predict what our technologies of the future will even look like. One thing can be certain though, and that is the rise of the use of machine learning in almost all industries. These new breakthroughs in efficiency are based on the choices happening in our algorithms today. If humans hone these capabilities sooner and create more successful, unbiased algorithms, the future of social media and technology as a whole could be changed for the better for decades to come.

— — — Written by Kathleen McKiernan for Holberton New Haven — — —

--

--

Kathleen McKiernan
The Startup

Cosmetologist to Coder — Holberton School New Haven