Attend Transform 2021 to learn the key topics in Enterprise AI & Data. Learn more.
Pinterest this morning pulled back the curtains on the AI and machine learning technologies used to tackle harmful content on its platform. The company uses algorithms to automatically detect adult content, hateful activity, medical misinformation, drugs, graphic violence, and more before it’s reported. According to the company, per-impression policy violation reports have decreased 52% since the fall of 2019, when the technologies were first introduced. And reports of self-harm content have decreased by 80% since April 2019.
One of the challenges in creating multi-category machine learning models for content security is the lack of labeled data, which forces engineers to use simpler models that cannot be extended to multi-model inputs. Pinterest solves this problem with a system trained on millions of human-scanned Pins that consists of both user reports and proactive model-based samples from the Trust and Safety operations team that assigns categories and takes action on content violations. The company also uses a Pin Model, which is trained on a mathematical, model-friendly representation of Pins based on their keywords and images, and aggregated with another model to generate reviews that indicate which Pinterest boards may be violating the rules.
“We improved the information derived from optical character recognition on images and provided an online version of our system in near real time. Also new is the way boards are rated, not just pins, ”Vishwakarma Singh, Pinterest team leader for trust and security in machine learning, told VentureBeat via email. “An effective multi-category [model] Using multimodal inputs – embeds and text – for content security is a valuable lesson for decision makers. We use a combination of offline and online models to achieve both performance and speed, and to provide a system design that is good for others to learn and generally applicable. ”
In production, Pinterest uses a family of models to proactively identify policy violating Pins. When enforcing policies across Pins, the platform groups Pins with similar images and identifies them with a unique hash called “Image Signature”. Models generate ratings for each image signature. Based on these ratings, the same content moderation decision is applied to all Pins with the same image signature.
For example, one of Pinterest’s models identifies Pins that it believes are in violation of the platform’s health misinformation policy. Trained using tags from Pinterest, the model internally finds keywords or text associated with misinformation and blocks pins with that language while simultaneously identifying visual representations associated with medical misinformation. According to Singh, it takes into account factors like image and url and blocks all images online via Pinterest search, home feed and related pins.
Since users typically store themed Pins together as a collection on boards on topics like recipes, Pinterest used a machine learning model to create board-level scores and enforce board-level moderation. A pin model trained on embeds (i.e., representations) only generates content security ratings for each Pinterest board. An embedding for the boards is created by combining the embeddings of the last pins saved in them. When these embeds are fed into the pin model, a content security rating is created for each board so that Pinterest can identify boards that violate guidelines without training a model on boards.
“These technologies, along with an algorithm that rewards positive content, as well as policy and product updates like blocking anti-vaccination content, banning culturally insensitive ads, banning political ads, and starting a compassionate quest for mental wellbeing, form the foundation to make Pinterest an inspiring place online, ”said Singh. “Our work has shown the effects that convolution methods can have on diagrams in production recommendation systems, as well as on other learning problems with large-scale representation of diagrams, including reasoning of knowledge graphs and clustering of graphs.”
VentureBeat’s mission is to be a digital city square for tech decision makers to gain knowledge of transformative technology and transactions. Our website provides important information on data technologies and strategies to help you run your business. We invite you to become a member of our community and access:
- current information on the topics of interest to you
- our newsletters
- gated thought leader content and discounted access to our valuable events such as Transform 2021: Learn more
- Network functions and more
become a member