Engineering Insights is an ongoing blog series that gives a behind-the-scenes look into the technical challenges, lessons and advances that help our customers protect people and defend data every day. Each post is a firsthand account by one of our engineers about the process that led up to a Proofpoint innovation.
Proofpoint Intelligent Compliance classifies text content that originates from everywhere—from social media content to customer-provided requests. One part of our system detects spam content, typically from social media-based sources.
A common challenge for spam detection systems is that adversaries modify their content to evade detection. We have an algorithm that solves this issue.
Sometimes false positives also need to be corrected. We handle this by maintaining an exclusion list and a positive list of spam signatures. In this blog post, we explain how we update spam signatures in real time without negatively impacting performance.
A need to scale without compromising performance
As the Proofpoint Patrol customer base has grown, we have needed to scale it in order to continue providing fast and reliable services. Originally, the text categorizer service was integrated with our core classifier service and could not be scaled independently. We decided to separate it into its own service so we could develop and scale it independently from our classifier service.
Our first release of this new system allowed us to scale more efficiently and resulted in a large decrease in latency. Part of the performance improvement came from loading the spam signature set into memory at service startup.
However, this led to a limitation where we could not easily update our positive or exclusion signature sets without rebuilding and redeploying our application. This meant that our spam system would not learn new spam signatures over time, which would also lead to an increase in false negatives.
An in-memory data storage solution: Redis
Shortly after joining Proofpoint, I was tasked with improving the spam detection system to learn over time, while retaining the performance benefits. We needed a solution with low read latency, and ideally low write latency, as our read-to-write ratio sat at around 80/20.
One potential solution was Redis, an open-source in-memory data storage solution. Amazon offers an implementation of Redis—MemoryDB—which can provide data persistence beyond what a typical cache solution can offer.
Overview of an in-memory signature storage solution.
At the performance end, Amazon boasts a microsecond read latency and single digit write latency. While investigating potential solutions, we noticed similar latencies with our workload.
We typically have more read queries than writes; however, we have occasional spikes in write queries.
A chart showing read commands over time.
A chart showing write commands over time.
Having MemoryDB persist our spam signatures and our exclusion list would allow our system to store new spam signatures at runtime. That would enable our system to improve over time as well. We would also be able to respond quickly to false positive reports by updating the list in real time.
After finishing our investigation and building out a proof of concept, we implemented the changes and found our response times remained low. Further, we found that we were not only able to update our signature sets in real time, but we were also able to respond to spam-related support inquiries more rapidly.
Join the team
At Proofpoint, our people—and the diversity of their lived experiences and backgrounds—are the driving force behind our success. We have a passion for protecting people, data and brands from today’s advanced threats and compliance risks.
We hire the best people in the business to:
- Build and enhance our proven security platform
- Blend innovation and speed in a constantly evolving cloud architecture
- Analyze new threats and offer deep insight through data-driven intelligence
- Collaborate with our customers to help solve their toughest cybersecurity challenges
If you’re interested in learning more about career opportunities at Proofpoint, visit the careers page.
About the author
Colby Leclerc is a software engineer at Proofpoint. He recently joined the digital risk team and helps work on machine learning and rules-based classification systems. Colby has worked in the software industry for more than eight years. He lives in Massachusetts and enjoys hiking, writing and learning about disparate subjects.