As part of the ReliaQuest SOC talk series, I had the opportunity to sit down with CTO Joe Partlow to talk about top concerns when migrating to AWS from a security perspective. In this video, Joe and I talk about the biggest challenges and opportunities with securing the cloud in technical details.
The key topics we cover are:
- How to stay compliant in the cloud
- How to get visibility into IAM and build least privileged policies
- Key Log Sources for AWS Security & Visibility
Near the end of the video, we cover both cloud native and vendor built AWS security approaches.
SOC Talk – Cloud Mini Series – AWS from RQTV on Vimeo.
Key Diagrams:
Transcript:
Joe Partlow: [00:00:00] Thanks for joining us today. My name is Joe Partlow, CTO here at Request, a cybersecurity company that’s focused on delivering successful outcomes to our customers. Welcome to this edition of Soc Talks. It’s a program formed by security operations, Community SOC talks, Conversations from the Trenches. It’s a series of discussions. We want to talk security practitioners who are kind of fighting the fight good fight every day. They share how they’re facing up to the challenge of threat detection, analysis and response, and how they’re leveraging their tools and techniques in the process. And today we are going to be discussing AWS Cloud security. So I’m excited to have James Bethany, senior Cloud security engineer from request with us today, specifically around kind of the criticality of visibility in securing the cloud. A couple of housekeeping things. Everyone’s muted, but if you have any questions, just enter in the Q&A window and we’ll do those either at the end or if it makes sense to bring them in during the time, we can do that too. So just a quick section on requests and who really request is we’re a leading cybersecurity company, close to 800 employees worldwide based out of Tampa, Florida. Our mission is simple We make security possible. Anyone agree that security operations landscape is pretty complex? Organizations are at various stages of maturity. Some don’t have all the tools and practices necessary, and maybe they do.
Joe Partlow: [00:01:28] We’re still in silos. They’re underutilized or not optimized to face a dynamic threat landscape. That could be because of skills gap, resources shortages, or a variety of other issues. But despite these kind of innovations in all of the technologies that we have, it’s we still want to make sure that we don’t fall behind. So rely we help our customers get ahead with security operations so they can protect the enterprise. We are basically the force multiplier for your security team. So you can kind of manage that cyber risk better regardless of your security. Simply put, we make you security operations center kind of work better for you so you can do more with less. That being said, how do we do that? We combine technology with services basically to, regardless of the maturity, making sure they grow and improve the programs kind of at their own pace rely gray matter is our Cloud Native security operations platform. We deliver it as a service. 365 any any day of the week, any any place in the world. It’s built on an open XDR architecture. It has bidirectional integration across pretty much any vendor solution, whether it’s on premise or in one of our or in a multi cloud environment. And we’re going to ingest data and automate actions. It kind of brings together that telemetry from any of the security and business solutions to give that singular visibility across the enterprise.
Joe Partlow: [00:02:53] It’s also going to unify your detection, investigation, response and resilience so you can kind of manage that security risk a little bit better based off our experience managing security for customers. We have kind of these best practices that we deliver automated detection content packages so that your team can be a little bit more agile, reducing the noise, automate your response actions so that you can focus on the task of running the business in those higher priority initiatives. All right. That being said, let’s get started. So obviously, you know, every environment that we talk to, every customer that we talk to, you know, cloud is is a factor. And more times than not, it’s multiple cloud AWS we’re going to focus on today. That’s one of the big three Azure and GCP being the others. Obviously there’s a variety of other ones. You may have Oracle Cloud, you may have any kind of proprietary vendor. So obviously there’s there’s a lot of cloud environments that you’re dealing with. Each one of these kind of bring up a different perspective on how to manage security. So James, I’ll kind of turn this over to you. What are you seeing kind of some of the challenges and opportunities specifically around AWS that that the teams are facing?
James Berthoty: [00:04:10] Yeah. Thanks, Joe. Good to talk to everybody. So really, just my background was started with the data center world and helping mid to larger companies do a lot of the legacy security setup from the firewalls and the data centers and all the vCenter configurations and stuff like that. But as things have moved to the cloud and getting involved more with specifically AWS security, the beauty of AWS adoption was the fact that they have this free tier, right where you could sign up, put a credit card in, and every every multimillion dollar company started as a T to micro server that quickly ballooned into a multi million dollar annual contract with Amazon. And so really all of the challenges around cloud security have to do with that exact use case of they’re not like a microsoft or a Google where they have sort of their own identity provider out of the box, like Google Workspaces, which a lot of people use or Office 365 or the Azure ad stuff. And so those other clouds have a more built in identity provider. But because it’s so easy to get up and running so quickly, really the challenge is security, which was a small team usually that had the gate, the gatekeepers to code, getting to production, to infrastructure, getting stood up through the vCenter configs and that on prem setup, all of a sudden everyone has access to everything through the APIs. All you need is a credit card to get started. Production apps are getting spun up, code is getting deployed very quickly. And so from a security perspective, that’s a challenge. Obviously, for a business perspective, there’s a ton of opportunity there. But I think what the security community is starting to realize, and this comes from compliance to incident response sort of across the entire gambit of security, is that there’s also a lot of opportunity that we can to force multiply our small teams to really take advantage of the cloud.
James Berthoty: [00:06:09] And so a lot of the challenges around the speed of deployment are really opportunities. If you have the right technologies and partners and setups in place. And so being able to do stuff like real time compliance monitoring, so you’re not just running like a big audit report once a once a quarter, but you’re actually monitoring real time and incident, doing incident response against changes and configs consolidate, consolidating your IM instead of all these disparate tools, consolidating it all through your SO providers and using that to manage the permissions to your data center and not just the permissions to different pieces of like your windows machines and different pieces of your Windows infrastructure. And so really a ton of opportunity to respond to stuff faster, to detect things faster, to stop things faster. But it really is a monumental challenge to shift security left in terms of security no longer as this gatekeeper at the end of the process, but someone who sits alongside and brings value to the org the same way any team can of increasing deployment, putting logical structure on things, putting well defined roles. And so really what we’re all about, what a lot of cloud providers are all about, what AWS is all about, is enabling greater speed of deployment to meet customer needs and focus on those outcomes. And security is really exciting, can be a part of that instead of a blocker to that, which is what it’s traditionally been seen as. And so that’s kind of the yes, there’s these challenges, but also these are really opportunities to move towards.
Joe Partlow: [00:07:45] Yeah. So, you know, obviously, James, you kind of talked about it’s funny, you know, everybody’s starting out with that free tier and everybody struggled with the micro instances. You know, obviously that kind of. Changes over time. Right. So what’s kind of what what have you seen is typically kind of the time frame, you know, how fast do various orders progress from, you know, kind of that that micro instance, more of like a test or beta environment to production because a lot of times we see that that turns into production immediately. And is there any of these kind of challenges or opportunities that are more specific to kind of that entry stage versus kind of the later stage?
James Berthoty: [00:08:25] Yeah, I think first of all, in terms of that, that even the idea of the micro is outdated at this point in terms of the speed of Docker files and microservices containerization where you’re really not spinning up a micro for testing, but you are able to set up a Kubernetes cluster that can serve ten users, but can scale up to millions of users and have the same code as is able to do that scaling very quickly. And so stuff can really get to production as soon as possible, very, very quickly. Have production grade infrastructure. And then as far as the really when you’re getting started, it’s important to think of everything that you’re creating in AWS as the possibility of creating technical debt where even like just this morning I was creating a new S3 bucket and it defaults to not encrypting it. And of course right now, right, I’m just using it to send a CSV there and it’s like, All right, well, I don’t need to encrypt that. That’ll be fine. But then all of a sudden I bet at some point that S3 buckets are going to be used to store something that maybe is kind of sensitive data. And then we’re going to have to go back and revisit it. And so really, if you can bake in a lot of the security oriented thinking upfront, it’s going to make it a lot easier to not have to go back and worry about like, wait, when I turn on encryption for my Kubernetes nodes, am I going to have to worry about a performance hit because you’ve always been encrypted and so you know that it’s fine. So the earlier you can start with the security and compliance thinking in mind, that’s going to be really helpful.
Joe Partlow: [00:09:53] Yeah, no, that’s a great point. And I think most of the breaches that we’ve seen that have involved AWS or Cloud in general have been a lot of configuration issues. And it’s I think it’s a lot of that. It’s, you know, hey, this was a test environment that immediately went production and you obviously didn’t have the appropriate safeguard. So definitely seeing that across the customer base as well. All right. So obviously with a specific you know, there’s there’s kind of three areas that that we’re looking at. You obviously, there’s compliance that comes into play, identity management and then log sources and how we ingest those. So, you know, keeping in mind with that, some of the topics that we’ll talk about is obviously that compliance extends into the cloud. So if you are HIPAA or PCI and you have those resources into the cloud, you need to make sure that you’re following those rules and extending that compliance footprint out into the cloud. You obviously from a visibility identity, we kind of touched on this. Another one is is is a huge piece to kind of manage that. And then obviously from a visibility, you know, there’s a lot of components in AWS that you can turn on and it’s growing kind of every day. So making sure that you’ve got visibility to that. Real quick on this one, James, I’ll kind of turn it over to you for any thoughts, but what are you seeing kind of as the, I guess, kind of the most critical components in AWS from a visibility standpoint? You obviously you kind of threw out Kubernetes. Is one of them. Any other kind of key areas that that that you see are are heavy lifts either from a compliance or just visibility standpoint?
James Berthoty: [00:11:34] Yeah, I think we’ll definitely touch on this more throughout. But at a high level, IM is definitely the number one thing. Because it’s again that drawback and opportunity piece. The drawback is it’s super complicated because you have JSON based policies where you can be very granular with permissions and one star to many and all of a sudden someone can assume roll into any account in the environment because you forgot to limit it to a specific resource. But the plus side is you can do a lot of automations and cool stuff around making it so that access keys are getting automatically rotated. Hopefully you’re using role based assignments, things like that. But then not just that piece, but the other the thing that connects with IAM is the cloud. Betrayal trails where everything in ores gets logged, every user action that happens. And so every every even role that’s being assumed by a service gets a cloud trail log. And so making sure that you have good content, good visibility into that service, we’ll be able to cover a wide range of potential options. So really, as we’re building out a security program over time, like definitely, I always start with IAM and cloud trail and then we’ll talk about a couple of easy out of the box stuff to turn on. And then from there I expand into like, all right, I want lambda specific logs, Lambda specific content, Kubernetes specific content. Yeah. The only other thing I’d mention is in the world of containerization, it’s very important that you drive your endpoint protection with containers in mind because so many people have a false sense of security where they have zero vulnerabilities, they have zero findings detected, zero incidents to report, but it’s because they have no visibility into their containers. They’re just seeing the root level node logs. And so it’s very important that you talk to your vendors about making sure that you have containers, container visibility across all of your endpoint tools.
Joe Partlow: [00:13:33] Yeah, yeah, definitely a lot of topics that certainly come up in the conversations that I’m having as well. All right. So if we focus kind of specific on compliance in general, you know, obviously there’s there’s a few kind of key opportunities that we have. Obviously, compliance is a lot of times the driver for for security teams or risk teams that they have to go off of. But really kind of security is the goal. I think a lot of prior lessons have told us that compliant doesn’t necessarily mean secure. So keeping that in mind and making sure that that’s that’s that’s part of kind of your plan, you obviously kind of alerting on that that configuration drift know it is a dynamic environment. So how do you keep up with that? And then starting with the end of mine, obviously, if the goal is to have a secure app, kind of what we touched on with that, hey, let’s start out with kind of this test or beta. You know, it’s very easy to kind of get off track with that. And then obviously another opportunity is kind of that shared responsibility. What what’s responsible for you or your team’s on your side and then obviously from from an OS side. So, James, I don’t know if you want to jump in and maybe give kind of a couple of examples of kind of where you’ve seen this gone wrong or. Well, and kind of what are some gotchas to look at?
James Berthoty: [00:14:53] Yeah, for sure. I think I think especially, you know, I have a lot of background in like SOC two compliance and really the cloud makes it more important than ever that you’re driving your compliance program off of security being the target. Because at the speed at which cloud technologies are developing really is making it so auditors are very much struggling to keep up. And so more and more you want to just make sure that you are targeting the security within the scope of your application and really following the data very closely. Because otherwise, like I’ve never seen an auditor look at specifics within a lambda function, for example. Right? Because they’re just not used to it yet across the board. But what they are going to be checking is like your S3 bucket configs, your IM configs, do you have an identity management system set up? And so really starting with like where, where are the main security places to look is going to be very important to passing that compliance outcome. Right? So I just think the with the move to cloud just keeping as narrow a focus as you can in terms of where is my risk at, what are the main services that I need to be paying attention to and really focusing there. And then on the flip side of that, being able to alert on configuration drift, This is where we really have started to treat compliance more and more with the power of incident response where like I can now have an incident that happens when someone either mistakenly or it’s an attacker trying to decrypt something or open their permissions up, edit an IM policy, really getting that it’s not just a compliance issue, but it is in the sense that you always want to respond to it. You always want to fix the issue. But it’s it’s also a security incident response finding in terms of it could be an attacker doing it. And so just allowing that extra level of validation to take place.
Joe Partlow: [00:16:55] Yeah, No, it’s a great point, especially around the incident response side, and we’re kind of segway into the shared responsibility. If something does happen, you know who’s who’s responsible for that. Right. Is that, is that your team, Is that your application team? Is that, you know, AWS? You know, who do you get kind of involved with that? So obviously, there’s a there’s a lot of kind of things to think about from the the shared responsibility. So, you know, obviously the big question that we get and a lot of I think misunderstanding that I see kind of out there in the market is who’s responsible for that. Right. Like what is the cloud provider actually delivering? What is responsible for the customer to do? How do they manage compliance? Is there any of that? I see most cloud providers have some form of SOC two also. But, you know, it’s it’s maybe for the underlying infrastructure, not all the new stuff that you’re putting on on top of that. So yeah, maybe James, if you want to kind of run through what you see is kind of typical from a customer responsibility and typically what an AWS responsibility is. And then maybe even more importantly, like where is that? Where are those misunderstandings where, you know, the customer is thinking that AWS is doing some of that?
James Berthoty: [00:18:08] Yeah, that’s I think what you just said is the main thing that I see where customers generally are over assuming what AWS is responsible for. And you see this a lot in the SOC two world, right where you’re really right and on AWS SOC two and trying to offload your controls to them even when it doesn’t really apply as much as you’re hoping, because really what they’re responsible for is just that hardware layer and even even this graphic is kind of can be kind of misleading, which is this comes from them. Right. And so in the sense that this doesn’t apply universally to every single service. And so you really have to be very well aware of what services you’re actually using. And this is why, like one of the things that we do through our through the SOC with us is like when a new service starts getting used in a new region just so that we can track and be aware of all of a sudden if we’re using just an EC, to instance, if we have like a serverless application that shouldn’t be using that, that’s opening our compliance and security scope in a huge way. If we’re just running Lambda functions where we’re not actually responsible for that underlying OS. But even there, there is some you have to be very careful in terms of because of with Lambda you can import Docker files, for example as part of your underlying OS, in which case if that Docker file was breached then it would be you who’s responsible for it. And if an attacker is able to get a foothold in that.
James Berthoty: [00:19:35] But if you’re using just the base AWS lambda, then AWS is the one who’s really responsible for it. The other main place you really see this play out as in the vulnerability scanning, especially with X, a lot of people are assuming that if they use X as the hosted Kubernetes service with Amazon, that you’re there assuming that the entire Kubernetes cluster is therefore protected by Amazon and they’re responsible for all of it. But really all Amazon is doing is hosting the control plane. And every single Kubernetes node that’s spun up is just an instance that’s a node. And so you’re not responsible for the security of the control plane instance, but you’re still entirely responsible for both the configuration that you’re pushing to the cube controller alongside the security of the pods, the nodes, how you’re doing networking. So even there where it says, like networking is a little it’s it’s a big asterisk on like, yeah, they’re responsible for like the the electrons going through the wires are them right like the physical layer of the networking. But as far as like the networking traffic protection, like that’s, that’s most of networking is the encryption, the routing, what instances are able to talk to what instances. And so really there’s still a ton of responsibility that is not offloaded to us as we think about how to secure. There really is a large scope and you have to be very aware of what services you’re using and parsing out what you’re responsible for versus what Amazon’s responsible for.
Joe Partlow: [00:21:06] Yeah, and it’s it’s and another thing to think about is kind of how how much you’re using the services, right? That could be as simple as hey, I’ve got one application that’s running on something that’s built in with with AWS or I may have a series of applications or maybe I have my whole whole infrastructure in AWS. So obviously there’s different layers in there and kind of who’s responsible for what. So one of those obviously is identity and jumping into kind of identity. That’s a that’s a tough one for for most folks. Forget about kind of all the the granularity and levels of detail you can get on the cloud side. You know, that’s been a struggle for most people before the cloud. You know, you you’ve I’ve heard of customers with 20 or 30 different identity sources on their on prem devices. Forget about kind of all the cloud side so. You. Obviously, it’s a critical piece. Maybe kind of go through James, like what you’ve seen is how does kind of us manage those identities? Obviously, it’s not very straightforward. But then how does that how does that kind of integrate into the security program in general?
James Berthoty: [00:22:11] Yeah, that’s this is where I do think it’s unfortunate that like, so the diagram you’re seeing is like the correct way to do us identity, but it’s also a very complicated multistep diagram which is the, the pros and cons of if you’re going to do this correctly, it really does take a level of investment into so and really understanding how identity works because in a in a bad scenario, usually how an account will get started is using static access keys for developers where they’re just creating a management console layer, but then they’re access keys and then all of a sudden, as soon as you start using access keys tied to user accounts, you’ve opened yourself up to having to rotate them, having to monitor their activity on a direct 1 to 1 basis and starting even to attach policies to users. Right? So even when that gets started, you will typically have the people will usually use the managed policies. And I’ll I’ll give an example of where this has gone bad. Before. There was an instance where the US support team pushed a policy update that changed the default read only policy to have read write access to one specific database service. And so if you were using at a large scale direct attached to managed service policies, you just opened up all of your users on the one hand to the security issue of being able to write to a database.
James Berthoty: [00:23:42] And on the other hand, they also lost all their read permission to everything else. And so if you had scripts that were running based on user accounts, all of a sudden all of that stuff broke. And so this is why it’s really important to picture this at a large level and sort of attack each of the things that you see on this diagram in terms of, first of all, having a master account so that you don’t just have all of these AWS accounts that are just being spun up all over the place. And so an example of some of the content we have is obviously alerting when new accounts are created, alerting when any configuration change happen at the master account level. And so just for example, we push our firewall policies from a master account down to all of the subaccounts. And so obviously we need to put the most amount of content incident response monitoring on that master account configuration. But then from there. All of those users instead of using static access keys to access the environment, it’s very important that you move towards assuming role and role assumption for for using the users. And the reason that’s important is when someone assumes role into an account, it allows you to first of all consolidate your permission sets. So for us, for example, we have Tier one, Tier two, Tier three accounts across every major department or group that needs access to different resources in us.
James Berthoty: [00:25:04] And then we’re able to log and track the usage of like, say, someone needs to use a tier three account which has very high level permissions. We want a higher level of incident response on that Tier three account to know when they’re using that role assumption. And so instead of having every user kind of gets their own even group assignment, the fact that they’re assuming role lets you just have that one place to manage the permissions for that group of users in the role assumption. The other big piece that’s important, and this is a lot of technical detail in here is when you do roll assumption, you’re getting an SDS token back instead of the static access creds. So if someone were to when someone uses access credentials in AWS, it just saves those in plain text to a AWS file on your computer that is very, very easy to breach. And so a very common cause of breaches is someone is using access credentials that they created a user account that’s just running on an easy to instance that’s open to the world. And so the way there’s a lot of ways to prevent that from the networking side, from the monitoring side. But one of the easiest, most important things you can do is just make it so that you’re using EC two roles, for example, where that machine is assuming role at the account level into a role that has specific permissions defined for it.
James Berthoty: [00:26:23] And then from that, the attacker is not able to gain just these plain text credentials, but they’re able to gain an sdrt token that would time out after a ten minute interval or so. And so those or you can sorry, you can set that to be the time out, period. To be as long as I think 12 hours is the maximum. And so really using that token based authentication to make it so that a. People are. You’re never using these access keys that can sit out in a GitHub bucket somewhere for years and years, and then they still have access years later, really emphasizing that role assumption layer and moving towards the QSo config because so works through doing the role assumption which really saves your security team a lot of maintenance headache down the road trying to rotate, manage assign owners to all these access keys and who has access to what. And so really that’s that’s the high level is staying focused on a couple of things right Staying focused on having it consolidated in a master account moving towards so not using the managed policies and really moving towards role based assumption and trying to stay away from user based static credentials. If I had to try to summarize all the problems with IM and AWS.
Joe Partlow: [00:27:42] Yeah. And it’s it’s you know, obviously we’ve seen time and time again those credentials kind of in GitHub or GitLab accounts that that really kind of opens it up. So kind of along those lines from a visibility standpoint, you brought up some good points obviously, of how do you how do you kind of look for that type of stuff? So if we’re talking about kind of specifically just, you know, visibility in general to environment environments, you know, obviously it’s critical. You can’t secure what you can’t see. So what are some of the things that security teams probably should look for, you think?
James Berthoty: [00:28:13] Yeah. This is where I think 100% visibility is always the target, but it is very, very hard to get there. There is an opportunity with us to always have 100% visibility at the IM and cloud infrastructure layers. And so this diagram that we see here, the idea is you want to really focus on where the outer arrows can come into the infrastructure because those are your public facing endpoints. And so there you have IM as public facing rate anyone can try to log in. If I were to get someone else’s access credentials, I could very easily run any commands as if I was as if I was them. So that’s one public facing piece and the other public facing piece is your application, right? So that’s where you want to build out application logging, logging, application security, looking at OWASP top ten, all of those things, because that can be used to then pivot into your server infrastructure and then from the IM side, pivoting into your cloud infrastructure. And so really the idea here is, is staying focused on, first of all, where to start, I think where to start is on the IM side and cleaning that up should be top priority for anyone who’s moving to AWS, wants to start investing in security program is really taking time to define your IM policy and to clean up the positioning. But then also the other piece you want to look at is your application security side, making sure you’re safe from just very common attacks like cross-site scripting, implementing CSP and SQL injection.
James Berthoty: [00:29:36] But I don’t want to go too far down that road because we’re focused on the US side. But really just the point that if you focus there, you really also want to be thinking about what vendors are tooling am I using across each of these layers? And so a lot of times when we think about like AWS security, especially because there’s a lot of tools you can buy in the marketplace, like they have very different infrastructure in terms of their ability to monitor in depth each of these tools. And so, for example, a lot of tools only do what’s become CSP Cloud Security Posture Management, which is really running like a scan, using APIs to check like, do you have your IM set up properly? Is a permission over over configured, but they don’t have any actual visibility into the running servers themselves, Right? That’s when you need to look at your vendors to actually put an agent on your static servers that are actually paying attention there. But then again, we talked about containers, right? You need to know if you need if your EDR is set up in such a way that it has visibility into the containers. Similarly, on the cloud infrastructure side, not every CSP supports stuff like Lambda functions, right? So how do you make sure that your lambda functions are being configured and deployed in a way that is secure, that you have guard duties on there? Like that’s just your general configuration layer stuff.
Full video available at: https://www.reliaquest.com/resource/webinar/soc-talk-securely-migrate-to-and-protect-amazon-web-services-environments/
Leave a Reply