Lessons using Ansible at J. Crew

Ansible, Ansible, Ansible. Oscar González, principal engineer at Sawyer Effect, gave a unique presentation today about J.Crew’s use of DevOps and Ansible Tower by Red Hat. As you may know, Red Hat acquired Ansible earlier this year and the addition has been phenomenal. Ansible gives your business simple, agentless automation technology.

“I’m a developer. I’m sorry.”

In 2015, Sawyer Effect was brought out to J. Crew to help improve their deployment process. They had a problem: A deployment would take 4-5 hours and had to be done overnight. What’s more, the entire process was like having a Rube Goldberg machine–lots of small moving parts which would, at some point, fail. The worst part of all of this was the toll it was taking on the teams. The human price was steep. Oscar likened this to Sisyphus–doing something over and over, learning nothing, not progressing, and keeping innovation from ever happening.

Something had to be done.

I’ll cut to the chase. J. Crew used Ansible, a DevOps approach, and their current tools and infrastructure to completely revolutionize their deployments. Oscar broke this down into 10 lessons.

 

6831091796_61e35ae2cf_z
Photo by meg

Lesson 1: You are not a unicorn

A company that’s established with a lot of infrastructure doesn’t have the same luxury as a new company that’s able to stay nimble and “rent” their infrastructure. You have to approach things a little differently to adapt and change with the times. One key to this is to become a teaching organization. You need to have a core group that understands different aspects of things and then share that knowledge.

Lesson 2: Nash Equilibrium

If you don’t know this bit of game theory, it breaks down to the concept of a noncooperative game of 2 or more players where each player knows the equilibrium strategies of the others. In this scenario, no player has anything to gain by changing their own strategy. In an economic sense, this manifests contrary to the previously assumed laws of economics. The market, as a whole, has a common set of goals which leads to progress for all.

When implementing the DevOps approach, it was discovered that efficiency was being hindered by different teams within the organization with different goals. Some efforts of one team would cancel other teams’ efforts. They needed to behave as a single team.

Lesson 3 : The Dip

This comes from a book by Seth Godin. The teams learned that they had to stay calm and avoid toxicity to advance. They also had to learn and practice patience and empathy. If people complain about the new tools/processes, you must listen and understand the underlying reasons.

Lesson 4: Trust

Trust your people and allow them to make mistakes. You also need trust between departments. You must create an environment that fosters and enables this trust. Allow people to try new things. If the team is too fearful to say “this isn’t working,” you’re going to find out the hard way.

ansible-tower-logotype-large-rgb-fullgrey-300x124Lesson 5: Ansible Tower

In their implementation of Ansible Tower by Red Hat, they found that the tool generated great reports. They created reports for managers, QA teams, etc. Through these reports, they were able to automate the mundane and got rid of meaningless work. After all, if you hire smart people, you want them to solve problems, not worry about things like copying files all day. The Ansible tower-cli also allowed them to use their existing Jenkins tools and servers. It allowed them to improve and keep what they already have.

Lesson 6: The single queue of work

You have different priorities for different teams. It’s a reality of business and it’s natural, but in the DevOps world, you can better share these priorities to better understand each other. Open communication is key. You need to budget unplanned work because you’ll have it. To help, you must gather utilization metrics and understand where you’re spending your time, then resolve discrepancies.

27354581144_461727e17c_cLesson 7: Use what you have on hand

“We use Red Hat and we have always used Red Hat.”

Oscar talked about trying to do too much at once. Avoid this. Take small steps. Breathe. They already had RPMs with yum modules set up, so why not use that to push everything? You don’t have to reinvent everything at once. They also found they could use a single playbook for everything in their business. Oscar recalled: “Ansible has a very good set of practices online, so we decided to use them to get a start.”

917707974_2525ab1dc1_z
Photo by Christian Rivera

Lesson 8: The bus factor

Train a team, not a person. This is the “hit by a bus / eaten by a raptor” factor that everyone talks about. Now, every time there is a request, it gets sent to a team. Not a single person. There’s a knowledge base to document everything. Now, people can take vacations without worry–further improving on the human cost. No matter who’s in the building, they have everything they need to deploy.

Lesson 9: You’re a tech company

“When you think about J. Crew, you know they have great designers. You know they’re in the fashion world. In this market, when you’re fighting with all of the other companies for the same talent, you need to become a technology company.”

All businesses today have to adapt to changes in devices and changes in the market. IT is a core competency of every business. Once J. Crew realized that, they were able to advance their business by advancing their technology. They advanced their technology by embracing an IT culture and attracting the right talent.

450003437_e7efa022c7_z
Photo by Tom Hodgkinson

Lesson 10: Have fun

Now this is really cool. A few people, outside of this core deployment group saw a need. Virtual machines need to be provisioned and spun up for different processes and people. This happens constantly. So, they made a bot, Crewbot, that lives in HipChat. This bot can give you the VMs you need with a simple request.

 

“Give me a new VM.” Done.

They also promote new projects. Each month, everyone gathers with their new ideas and a pitch for why it’s important. The best ideas win. Open source in action–live.

Results

“Nothing takes 5 minutes.”

This is all great, but what came of it? Big results. Game-changing results.

Those 4-5 hour deployments? A thing of the past. They’re now able to deploy, thanks to Ansible Tower, in 5 minutes. 5 minutes–no joke. Their operations costs are down 20%. Crewbot has saved them $750,000 so far. And that’s only a single script from an outside team. As they continue, new savings are being realized every day.

The future is bright for J.Crew

The company is adding two more brands. These brands will use the same Ansible playbook. Everything is being standardized and they want to use this model across the entire company. Oscar also mentioned that they need to upgrade to Ansible Tower 2.1 and that will be underway soon.

Want more?

Watch the video.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s