Episode 145: Managing Static Code Analysis with Tiago Ruivo

Tiago Ruivo is a Senior Regional Success Architect here at Salesforce. He started his career in computer science and was introduced to Salesforce through pure luck as a customer. Eventually, he made the jump over to working for Salesforce and, over time, progressed from being a junior developer to holding his current role.

Today, Tiago and I are talking about one of my favorite subjects: static code analysis. Throughout the episode, Diego walks us through how to run it, how to manage it, and how to operate it with your teams.

Show Highlights:

More about Tiago’s background and career history.
How he describes his current role.
What static code analysis is and where it falls in the development cycle.
What the term “shift left” means.
What static code analysis is good at catching and what it’s not good at catching.
The tools that make up static code analysis.
The first steps to take when you run code analysis.
Best practices for making your own set of rules with static code analysis.
Where you can learn more about all of this.

Episode Transcript

Tiago Ruivo:
I always enjoy try to break them and finagle with a command line all the way from when I was a kid.

Josh Birk:
That is Tiago Ruivo, a senior regional success architect here at Salesforce. I’m Josh Birk, your host of the Salesforce Developer podcast. And here on the podcast you’ll hear stories and insights from developers, for developers. Today, we sit down and talk with Tiago about one of my favorite subjects, data code analysis. And he’s going to walk us through a little bit about how you can run it, how you can manage it, and how you can operate with it with your teams. But we’re going to start, as we often do, with his early years. In fact, we’re going to pick right back up and ask a little bit more about breaking things.

Tiago Ruivo:
Yeah. No, not physically. Break them is more, try to play with the settings and with the command line and see what I can do there. We use them for gaming too, my brother and I. And we always shared a computer and so we always try to finable our settings and finagle our way around.

Josh Birk:
Got it. What were some of your early games?

Tiago Ruivo:
The Portuguese is going to come out here. Very much FIFA was-

Josh Birk:
Nice.

Tiago Ruivo:
… one of my early games, the soccer thing got in from the beginning.

Josh Birk:
Got it. You and several hundred thousand other people, I believe.

Tiago Ruivo:
Yeah.

Josh Birk:
Now, how did you end up going from a degree in Madrid to getting a degree in Illinois?

Tiago Ruivo:
It was a pretty cool experience. There was an agreement between the two universities I had, and so ultimately I was able to get a little bit of a double program, and that’s why I have the two masters. I mostly just came here to do some research and finished grad school here.

Josh Birk:
Nice.

Tiago Ruivo:
And ended up staying.

Josh Birk:
Well I got to say, as a lifelong Illinois person, some of us might say that you did that the wrong way around.

Tiago Ruivo:
I was pretty lucky with my career and my IT life here on this side of the pond. So obviously, I’m very happy. I love Chicago.

Josh Birk:
I love Chicago too, all jokes aside. What did that look like? What did you transition into after school?

Tiago Ruivo:
I came here to a computer science program. I was in a computer science department doing some research, I did a little bit of big data. And actually, my first job after school that I got through school was Fermilab, which is this particle accelerator that is here in the suburbs of Chicago that a lot of people don’t know about. It’s this super cool department of energy lab that literally accelerates particles, and they have a lot of super computing stuff. They analyze a lot of the data from CERN in Switzerland. And so I was working in their computing research department doing something that had very, very much nothing to do with Salesforce.

Josh Birk:
I was going to say, big data, little particles, you seem to have a bit of a spread there. Let’s ask the question, when did you first get introduced to Salesforce?

Tiago Ruivo:
My first contact, it was actually pure luck. My first contact was as a customer. When I was in grad school, my research colleague in the lab was the one that maintained the internal instance of the university as part of their student work. And so I was able to help him model it a bit and learn a bit about Salesforce, instead of actually doing our job, or while things ran and we were waiting for things. That’s my first contact was as a customer. When I actually got into it and started doing something a little bit more professional was in a partner. I completely, by chance again, I was actually working in, again, this super computing research, very firmware, hardware heavy job.

Tiago Ruivo:
And I got a recruiter reach out for engineer developer role for a company that I knew nothing about and have never heard before. And I went for an interview, I really liked the culture. I really liked what they did. They moved quickly, which is the opposite of what a government lab does.

Josh Birk:
Right.

Tiago Ruivo:
And so I made the jump and I started as a junior developer, a lot of time as a technical architect, and keep at it now.

Josh Birk:
Nice.

Tiago Ruivo:
More than a decade later.

Josh Birk:
Wow. And when did you join the mothership?

Tiago Ruivo:
About three years ago. I’m actually about to turn four here in October. So maybe by the time this episode airs before. And I’ve always been since in salesforce.org, so our nonprofit and education side of Salesforce.

Josh Birk:
How would you describe your current job?

Tiago Ruivo:
My current job, it’s very non-structured, which I actually really appreciate. I’m what we call a success architect in our customer success group. That means is that we tend to engage with our largest customers to make sure that they’re setting up the architecture of their platform in the right way, that they’re following best practices, that they are solutioning things appropriately. That once you have complex multi org structures, complex system architectures, we get involved in making sure that they’re using the right tools and scaling the platform appropriately.

Josh Birk:
Gotcha. Well speaking of that, our topic of the day, static code analysis I think probably falls a lot into that. But let’s start with the basics, what exactly is static code analysis?

Tiago Ruivo:
Yeah. No, static code analysis ultimately is a tool that you can use to run quality checks on your code base, and to make sure your code is readable and maintainable, and scalable and follows your coding standards. Some of the things that are interesting about static code analysis is that, one, it’s an automated tool, so it’s something that it can run on the background. And you don’t have to have any human look at it. And then it’s something that is actually not going to run your code. And so it’s just going to look at the text of your code, and based on a series of rules you define, figure out if you’re violating any of those best practices or coding standards.

Josh Birk:
Gotcha. So not a unit test, not something that’s actually engaging with the code and moving the parts around.

Tiago Ruivo:
Absolutely not. Yeah.

Josh Birk:
Elevator pitch is like an automated code review to catch the big flag stuff and the basics.

Tiago Ruivo:
Exactly right. That’s correct.

Josh Birk:
Where does that fall into the development cycle? Where do you start hooking in the static code analysis?

Tiago Ruivo:
My recommendation, and what I always tell customers is, as soon as you can. When we actually talk about fixing issues with code or fixing issues with any technology, the earlier you catch them, the easiest it is to fix them. The cheaper it is, the faster it is. And so the earlier you can have these checks, the better your code base is going to be, and the actual faster the development process is going to be. The recommendation here is to really put them as part of your coding environment, as part of your ID. And there’s a bunch of extensions that you can add to your visual studio code or any other ID that can run this static code analysis for you, even as you write the code. And you’ll see, highlight it in either a tab or even on the actual line of code you’re writing, some of the violations you are committing or that the tool detected as part of your coding. So by the time you ultimately push this to a sandbox or commit this to a repository, it’s already completely free of any issues.

Josh Birk:
Got it. And we’re going to walk through some of your recommendations for best practices, but I feel like I accidentally just jumped ahead a little bit there. Is what you’re describing what’s called shift left?

Tiago Ruivo:
Yes. And that’s a very consult-y buzzword within the world of DevOps. But ultimately, [inaudible 00:08:21]-

Josh Birk:
Just to [inaudible 00:08:22], I’ve heard it, I’ve seen it in action, I still don’t quite understand the etymology. When the DevOps people say shift left, where is that coming from?

Tiago Ruivo:
Sure. If you think about the development process and the release cycle of a specific feature as something that goes through your development in a dev environment, you have a QA, you have integration environments and staging environments, and UATs, and so on and so forth. People tend to represent that as a little bit of a train left to right. And so your kind of feature goes all the way from the left where you are developing, to the right where it’s released to production. So when we say shift left, it just means pushing those kinds of activities, the more to the beginning of the development process as you can.

Josh Birk:
Got it. And that way, as you just said, it doesn’t even have to get into the sandbox, you’re not waiting for the UAT. And to, again, that makes it faster, leaner, and easier to catch the bugs as opposed to trying to figure out why UAT is all broken.

Tiago Ruivo:
Exactly. If by the time you actually find something, it’s already in a UAT environment, you need to go get your actual dev sandbox up to date with what’s in UAT. Then pull the code down, then go fix it, then push it, then go through the code review process, then do three or four deployments and run the unit tests. That takes a while, and it can be a line of code that you want to capitalize a specific variable name, or something like that.

Josh Birk:
Right. Right. And I think it’s come up on the show before, because it’s not like you can wait until UAT and then find something wrong, and you’re really just waiting in UAT phase. You really are doing what you just described, you’re sending yourself back to go. You have to start back over from the progress again and get yourself back up to UAT. So you’re killing yourself from a timeline point of view.

Tiago Ruivo:
Exactly. And a lot of times, how your sprint tends to be set up, you don’t actually have a lot of time to fix issues. So what ends up happening a lot of times is that you drop that feature out of that release and you’re just delaying the actual release process.

Josh Birk:
Right. Right. Okay. Let’s go back and level set a little bit more on static code analysis itself. First of all, and I love the example I’ve seen in one of your presentations on this. What is static code analysis good at catching and what is it not good at catching?

Tiago Ruivo:
Sure. Static code analysis is really good at catching things that tend to be in your coding standards. If you think about things like readability of your code, things about complexity of your code, things about even some of these security best practices or performance best practices, static analysis is great at catching all of that. And you really can automate a lot of your code review process. What static code analysis doesn’t know is, first, how your code is going to run. It doesn’t actually run your code. And so anything related with actually running of the code, it’s something that the static code analysis just doesn’t have the information to catch.

Tiago Ruivo:
And then the second one is, it doesn’t know your business processes. It doesn’t know actually where you’re trying to implement, it’s just a general check for general rules. Anything that is functionality based, my code should do this rather than should do that, it’s something that static code analysis is not going to catch. And so this is where something a unit test or a solution review, or both probably you want to have, can really compliment what the static code analysis is going to give you.

Josh Birk:
Gotcha. So to summarize, it can catch best practices because best practices is something you can document. But even if I’m using best practices, I can do things like delete all the accounts under an opportunity even if I shouldn’t do that.

Tiago Ruivo:
Exactly. I have that example somewhere in a presentation where I have a statement that is just a delete of all the accounts in your org. Static analysis is not going to catch anything about it, because he doesn’t know if you actually want to delete all your accounts in your org or not.

Josh Birk:
Right. So let’s-

Tiago Ruivo:
If you then not capitalize one of the… If you don’t capitalize your class names, it’s going to scream at you, because obviously that’s something that is general and that can be just part of your naming conventions.

Josh Birk:
Nice. Now, we’re saying static code analysis is an umbrella term, but it’s actually a couple of tools. What are those tools?

Tiago Ruivo:
Yeah. Ultimately, best practices and cutting standards really depend on the language you’re using and the programming language you’re using. And so ultimately, static code analysis, as it is, is based on a couple of open source projects that ultimately defines default sets of rules that you can just use as part of your environment. You don’t have to even customize or worry about building your own. Out there, there’s a bunch of open source projects that have standard sets of rules, and you can just take them and use them as they come. You need to make this super complicated. For us at Salesforce, we have an opensource project called PMD that has the default rule sets defined for Apex and Visual Force. And then we recommend using ESLint for rules related with JavaScript. And on top of that, we actually have a repository that we extend for lightning web components specific best practices too outside of JavaScript.

Josh Birk:
Gotcha. Now, one of the things that can happen, especially I think if somebody is using a tool like PMD or ESLint for the first time, and maybe they’re realizing just how off centered their code is the first time they might run it. Especially the in kind of organizations you’re talking about, that might be multi org and they might have some very old repositories. Well, let me throw a quick, what’s the largest number of errors that you think you’ve seen? What’s that overwhelming number that is actually a realistic possibility for somebody who’s working in one of those orgs?

Tiago Ruivo:
For a single org, and so with a large code base, I found… They call them violations. I found violations in the millions.

Josh Birk:
In the millions?

Tiago Ruivo:
Yes. So millions of violations in a single org. And think that you actually have Apex limits for how many characters you can have in an org.

Josh Birk:
Right. Okay. Well, [inaudible 00:15:09] that… Yeah, [inaudible 00:15:10] go.

Tiago Ruivo:
I like to say that’s a motto when I present this to customers, that static analysis is a little melodramatic. Because it’s a little trigger happy when we talk about rule violations. And so what ends up happening, and I think this is where we are going, is that when you first run a static code analysis and you look at having, I don’t know, 150,000 violations. There’s nothing you can do with it, it’s overwhelming. You don’t have time or capacity in your team to dedicate the next four months to go fix 150,000 violations in your code. And so, you really need to figure out how to make the best out of those results.

Josh Birk:
Yeah. I was expecting you to say somewhere in the hundreds of thousands, the idea that somebody’s… If I was a developer and I saw anywhere around a million, I think I would become a librarian. I think I would just quit. I’d just be like, “No, no, this is not going to be the rest of my life.” But you’ve seen consistently that number is scary, but something that’s something you can conquer, right?

Tiago Ruivo:
Exactly. And we’re going to get into this, but it’s all about prioritization. If you think about the fact that you have a Salesforce developer, they come from Python, they capitalize their variables differently than we do, or they capitalize their class names differently than we do. And suddenly you have 750 classes. And so if they capitalize class names differently, that’s going to be 750 violations of the rules, just for starters.

Josh Birk:
Right. What’s your first starting point to tell to people? Is it trying to categorize and figure out what’s actually a problem, like a security flaw, or is it customizing PMD or something?

Tiago Ruivo:
Sure. My first step on when you’re running these code analysis is really prioritization. It’s kind of obvious, right? You look at 75,000, you’re going to need to prioritize this. I think the thing to know is that prioritization here is really not complicated too. Ultimately, all these, you’re dealing with these default rule set something like a PMD or ESLint. They have a set of rules that already exist, they define. That’s going to be mostly what are you going to be using. And each rule tends to be assigned to a category. And so you have, I think for PMD seven or eight categories. You can actually just prioritize your violations at the category level and say, for example something like a security, it’s something you’re going to need to fix now versus code documentation or code style, it’s something that you can fix later.

Tiago Ruivo:
The first level is prioritization. My rubric is really very simple to do this, I tend to categorize at the category level. Again, things like errors or things like error prone things tend to be at the top. I have three priorities. I categorize things into issues, warnings, and nice to haves. Again, keep it very simple. And then, within each category I assign it either a category to be an issue, to be a warning, or to be a nice to have. And so again, your security, your error prone things are going to be issues. Things like performance or things like best practices tend to be warnings. And all of these stylistic stuff or documentation stuff tends to fall on the nice to have category.

Josh Birk:
Gotcha, gotcha. That also seems to fit into trying to prioritize the solutions themselves. Because if I had 75,000 violations and 10 of those were security violations that my unit tests aren’t catching, or whatever other process should be catching a security flaw isn’t catching, then obviously I would hope that it’s not much more than 10. But it makes a lot of sense to throw a lot of resources to fix those 10 as quickly as possible, while basically ignoring things like stylistic changes. Which may be in a code review a good thing to have, but currently isn’t technically breaking anything. Your developers aren’t quitting over those kind of best practices. I guess it sorts itself out from a triage point of view as well.

Tiago Ruivo:
Exactly. The whole point is to maximize the impact of actually running a static code analysis and make it useful. You really want to find the needles in the haystack of those 75 violations. And based on my experience, ultimately this kind of categorization is really going to, number one, maximize your impact. And again, dedicate the resources to go fix what matters. But also, you are going to realize that 95% of your violations are going to go into that nice to have bucket, are not going to be stylistic violations. Hopefully. If you find yourself with 90% of security violations, you have a bigger problem to solve. And that’s why I said it’s a little static code analysis can be a little bit melodramatic. Running this in general, most of the violations that it’s going to find are going to be stylistic in nature.

Josh Birk:
Yeah. I feel like if I was a senior developer at a large company with a large org with a lot of code, and I ran this for the first time and I had 95% of my violations were either security or performance issues, then I probably should go become a librarian. Somebody else should come take my job at that point. Now, how does this compare over to the LWC side, over to the ESLint side? Is it pretty much a similar story?

Tiago Ruivo:
It’s very similar. Ultimately, in JavaScript you have some things that are very much error prone that tend to be issues. And you have your security vulnerabilities in the JavaScript side too. And those fall on the issue bucket. On the warning bucket, you have a lot of the unused variables, performance stuff. You have a lot of the rules associated with even ECMA six adoption, and adopting the new ways of doing things. And then there’s obviously a lot of the stylistic stuff that all is always there. One thing that is different on the JavaScript side, I’m a sucker for using strict everywhere, and making sure that people are doing that. So for me, that’s an issue. People can disagree. And what I would say here is that, in general when approaching these prioritization and how you’re assigning different priorities to different violations, or different rules or different categories, this is not a science. Depends on how large your team is and how much risk you’re willing to accept, and how much time you actually have to do this. You can decide to prioritize very differently than what I have. And there’s the right answer.

Josh Birk:
Right. And in moving into one of your other best practices is, because by nature static code analysis has to be opinionated. It’s not offering you a, well, I think your code should look this way. Just by the way it works, it’s saying your code should look this way. What’s available to us, if say, I’ll take your previous example. What if it’s a bunch of Python developers, literally just a team of Python developers, and that’s just the way they want to code Apex? They want to code Apex, it looks a little bit more like Python. Can you tell PMD, hey, back off, our opinion here is that this is what the style should be?

Tiago Ruivo:
Yes. I would say, please don’t.

Josh Birk:
That was an extreme example, I will [inaudible 00:23:32].

Tiago Ruivo:
But you totally can. Ultimately, the good thing about this is this all based on open source projects. The GitHub guys are out there, and so you can make your own rule set. You don’t have to use PMD, you don’t have to use ESLint. I would very much recommend starting with those. What you can do is you can just fork the PMD repository, if you want to talk about Apex. So fork the PMD repository and then change it as needed. You can remove a bunch of rules, you can customize the existing rules and change them to make sure that they reflect what you actually want. An example that I tend to do when we talk about customizing rule sets is that, we talked a little bit about prioritization. Typically, PMD and ESLint already assign priorities and severity to specific rules. They are not necessarily priorities and severities that you should trust blindly.

Tiago Ruivo:
Oftentimes, they assign priorities per category and they have a little bit of a different rubric that may be very different than what you actually want to do. My recommendation there is do not trust the default priority that gets assigned. And so what you can do is you can check out PMD. And these rules in PMD are just an XML. I guess Salesforce developers are probably familiar with those. And so they’re just a priority tag that you can make it to whatever priority you actually want to assign to that specific rule. And once you run the static code analysis again, that going automatically prioritize your results.

Josh Birk:
Gotcha. Going back to the Python joke, my familiarity with PMD is that a lot of these rules, this isn’t just like Robert Soseman got up one morning and decided he was going to write some random rule in the PMD. These are generally broadly accepted best standards, best practices that have been really reviewed on a community basis. If you’re going to argue with PMD, you should probably think twice about it.

Tiago Ruivo:
Absolutely. Yes. And again, really, you should really not be changing PMD much. You can add your own cutting standards, you can add your own naming conventions. If you want to end each test class with test, you can add a rule that does that for you, for example. But these are pretty much based on Apex development best practices and naming conventions that are very much public out there. In 99.9% of the cases, I don’t see a need to necessarily change any of these. I think the prioritization, or things like that, maybe something that you may want to change, but the rules themselves are really a great start for code quality analysis.

Josh Birk:
Yeah. No, that makes a lot of sense because you’re not changing the rule, you’re just changing what the importance is to you compared to what it was from default. Where can people learn more about this?

Tiago Ruivo:
Where can people learn? PMD and ESLint have websites that are actually pretty complete. They have very nice rule descriptions that are automatically generated from their rule sets. And so, that is a great way to get started. Ultimately, you also need a tool to run your static code analysis. You are going to need your command line tool, or you are going to need your visual code extension, or you are going to need some tool to run this. I would also push people to just the documentation for those specific tools, because that’s actually where you’re going to end up figuring out how to run this. And I tend to recommend PMD Apex in Visual Studio Code as an extension. There’s an ESLint extension of Visual Studio Code. And then there is something called… I want to say it’s called Salesforce Code Analyzer plugin-

Josh Birk:
Yes. Sounds right.

Tiago Ruivo:
It’s an extension that it can install on top of SFDX of the SFDX CLI that allows you to run all of these from your command line. And it already has the latest versions of PMD and ESLint, and so on and so forth.

Josh Birk:
And that’s our show. Now before we go, I did ask after Tiago’s favorite non-technical hobby. And among other things, like many of us, he’s, well, looking forward to getting past this pandemic.

Tiago Ruivo:
Honestly, most of my hobbies are non-technical. I tend to spend too much time sitting in front of a computer on Zoom talking to other humans alone. I guess because we are ending the summer, I’m going to say cycling. I really spend a lot of time cycling this summer and I love doing it. It’s something you can do alone, but you can also make it a social activity. I joined a cycling group this summer. And it’s something that really relaxes me and brings me joy, to quote Mary [inaudible 00:28:43]-

Josh Birk:
Nice. I want to thank Tiago for the great conversation and information. And as always, I want to thank you for listening. Now, if you want to learn more about this show, head on over to developer.salesforce.com/podcast, where you can hear old episodes, see the show notes, and have links to your favorite podcast services. Thanks everybody. Talk to you next week.

Show Highlights:

Links:

Episode Transcript