testing-trophy

Static test란 무엇인가?

코드를 실행하지 않고, 작성하는 단계에서 오타나 타입 오류를 자동으로 잡아주는 검사다. 맞춤법 검사기처럼 코드를 쓰는 즉시 실수를 알려준다.

Static: Catch typos and type errors as you write the code.

Unit test란 무엇인가?

프로그램의 가장 작은 부품 하나하나가 제대로 작동하는지 따로 떼어내서 확인하는 테스트다. 자동차의 브레이크 패드만 꺼내서 성능을 시험하는 것과 비슷하다.

Unit: Verify that individual, isolated parts work as expected.

Unit test의 장점과 단점은?

작은 부품 하나를 빠르고 저렴하게 검사할 수 있지만, 부품을 조립했을 때 전체가 잘 맞물리는지는 알 수 없다. 검사 범위가 좁아서 같은 양의 기능을 확인하려면 테스트 개수가 많이 필요하다.

Unit tests typically test something small that has no dependencies or will mock those dependencies (effectively swapping what could be thousands of lines of code with only a few).

The lower down the trophy you are, the less code your tests are testing. If you're operating at a low level you need more tests to cover the same number of lines of code in your application as a single test could higher up the trophy. In fact, as you go lower down the testing trophy, there are some things that are impossible to test.

Unit test에서 의존성을 mock하고 호출 assertion이 통과하면, 실제 연동도 정상이라고 볼 수 있는가?

아니다. 유닛 테스트에서는 외부 부품을 가짜(mock)로 대체하기 때문에, 진짜 부품과 연결했을 때 올바르게 소통하는지는 확인할 수 없다. 리허설에서 대역 배우와 호흡이 맞았다고 해서 본 공연에서도 맞으리란 보장이 없는 것과 같다.

Unit tests are incapable of ensuring that when you call into a dependency that you're calling it appropriately (though you can make assertions on how it's being called, you can't ensure that it's being called properly with a unit test).

Integration test란 무엇인가?

여러 부품을 실제로 조립한 뒤 함께 잘 작동하는지 확인하는 테스트다. 가짜 부품(mock)을 최소한으로 줄이고 실제 환경에 가깝게 검사한다.

Integration: Verify that several units work together in harmony.

The idea behind integration tests is to mock as little as possible.

Integration test의 장점과 단점은?

Testing Trophy에서 가장 큰 비중을 차지할 만큼 중요하며, 적은 수의 테스트로도 넓은 범위를 검증할 수 있다. 가짜 부품은 네트워크 요청과 애니메이션 정도만 사용하므로 실제 동작과 가깝다.

The size of these forms of testing on the trophy is relative to the amount of focus you should give them when testing your applications (in general).

The idea behind integration tests is to mock as little as possible. I pretty much only mock:

Network requests (using MSW)

Components responsible for animation (because who wants to wait for that in your tests?)

Integration test에서 mock하는 것과 하지 않는 것의 기준은?

가짜로 대체하는 것은 네트워크 요청과 애니메이션 두 가지뿐이다. 네트워크는 외부 서버에 의존하지 않기 위해, 애니메이션은 테스트 속도를 위해 대체하고, 나머지는 모두 실제 코드를 그대로 사용한다.

I pretty much only mock:

Network requests (using MSW)

Components responsible for animation (because who wants to wait for that in your tests?)

Integration test도 잡을 수 없는 문제는 무엇인가?

화면(프론트엔드)과 서버(백엔드) 사이에 올바른 데이터를 주고받는지, 서버 오류를 제대로 처리하는지는 확인할 수 없다. 매장 내부 동선은 완벽해도, 택배사와의 연동 문제는 매장 안에서 발견할 수 없는 것과 같다.

UI Integration tests are incapable of ensuring that you're passing the right data to your backend and that you respond to and parse errors correctly.

E2E test란 무엇인가?

실제 사용자처럼 앱을 클릭하고 입력하며 전체 기능이 정상 작동하는지 확인하는 테스트다. 프론트엔드와 백엔드를 모두 띄운 상태에서 로봇이 사람 대신 앱을 사용해보는 방식이다.

End to End: A helper robot that behaves like a user to click around the app and verify that it functions correctly. Sometimes called "functional testing" or e2e.

Typically these will run the entire application (both frontend and backend) and your test will interact with the app just like a typical user would. These tests are written with cypress.

E2E test의 장점과 단점은?

실제 사용 환경과 가장 비슷하므로 신뢰도가 높지만, 관여하는 부품이 많아 어디서 문제가 생겼는지 추적하기 어렵다. 또한 비용과 실행 시간이 가장 크지만, 문제를 사전에 발견하지 못하는 것보다는 낫다.

An E2E test has more points of failure making it often harder to track down what code caused the breakage, but it also means that your test is giving you more confidence. This is especially useful if you don't have as much time to write tests. I'd rather have the confidence and be faced with tracking down why it's failing, than not having caught the problem via a test in the first place.

The higher up the trophy you go, the more points of failure there are and therefore the more likely it is that a test will break, leading to more time needed to analyze and fix the tests.

End to End tests are pretty darn capable, but typically you'll run these in a non-production environment (production-like, but not production) to trade-off that confidence for practicality.

E2E로 모든 edge case를 잡으면 가장 확실한 거 아닌가?

그렇지 않다. E2E 테스트는 전체 시스템을 띄워야 해서 준비 비용이 크므로, 작은 예외 상황까지 전부 E2E로 잡으려 하면 낭비가 심하다. 각 테스트 레벨에 맞는 문제를 배분하는 것이 핵심이다.

At the top of the testing trophy, if you try to use an E2E test to check that typing in a certain field and clicking the submit button for an edge case in the integration between the form and the URL generator, you're doing a lot of setup work by running the entire application (backend included). That might be more suitable for an integration test. If you try to use an integration test to hit an edge case for the coupon code calculator, you're likely doing a fair amount of work in your setup function to make sure you can render the components that use the coupon code calculator and you could cover that edge case better in a unit test. If you try to use a unit test to verify what happens when you call your add function with a string instead of a number you could be much better served using a static type checking tool like TypeScript.

Testing Pyramid 대신 Testing Trophy를 쓰는 이유는?

기존 피라미드 모델은 비용과 속도만 고려해 유닛 테스트에 집중하지만, Trophy 모델은 "테스트가 실제 사용 방식과 얼마나 닮았는가"라는 신뢰도 기준을 추가한다. 비용·속도·신뢰도 세 가지를 균형 있게 고려하여 통합 테스트에 가장 큰 비중을 두는 전략이다.

As you move up the testing trophy, the tests become more costly. This comes in the form of actual money to run the tests in a continuous integration environment, but also in the time it takes engineers to write and maintain each individual test.

As you move up the testing trophy, the tests typically run slower. This is due to the fact that the higher you are on the testing trophy, the more code your test is running.

The cost and speed trade-offs are typically referenced when people talk about the testing pyramid. If those were the only trade-offs though, then I would focus 100% of my efforts on unit tests and totally ignore any other form of testing when regarding the testing pyramid. Of course we shouldn't do that and this is because of one super important principle that you've probably heard me say before:

The more your tests resemble the way your software is used, the more confidence they can give you.

What does this mean? It means that there's no better way to ensure that your Aunt Marie will be able to file her taxes using your tax software than actually having her do it. But we don't want to wait on Aunt Marie to find our bugs for us right? It would take too long and she'd probably miss some features that we should probably be testing. Compound that with the fact that we're regularly releasing updates to our software there's no way any amount of humans would be able to keep up.

So what do we do? We make trade-offs. And how do we do that? We write software that tests our software. And the trade-off we're always making when we do that is now our tests don't resemble the way our software is used as reliably as when we had Aunt Marie testing our software. But we do it because we solve real problems we had with that approach. And that's what we're doing at every level of the testing trophy.

confidence coefficient란 무엇인가?

각 테스트 레벨이 제공하는 상대적 신뢰도를 뜻한다. Trophy 위로 올라갈수록 한 건의 테스트가 주는 확신이 커지지만, 그만큼 비용과 시간도 늘어난다. Trophy 꼭대기 위에는 사람이 직접 테스트하는 단계가 있다고 상상하면 된다.

As you move up the testing trophy, you're increasing what I call the "confidence coefficient." This is the relative confidence that each test can get you at that level. You can imagine that above the trophy is manual testing. That would get you really great confidence from those tests, but the tests would be really expensive and slow.

각 테스트 레벨이 잡을 수 없는 문제를 정리하면?

정적 분석은 비즈니스 로직 오류를 못 잡고, 유닛 테스트는 부품 간 연결 문제를 못 잡고, 통합 테스트는 프론트-백엔드 간 데이터 전달 문제를 못 잡고, E2E 테스트는 실제 운영 환경과 완전히 동일하지 않다. 각 레벨에는 고유한 사각지대가 있다.

In particular, static analysis tools are incapable of giving you confidence in your business logic. Unit tests are incapable of ensuring that when you call into a dependency that you're calling it appropriately (though you can make assertions on how it's being called, you can't ensure that it's being called properly with a unit test). UI Integration tests are incapable of ensuring that you're passing the right data to your backend and that you respond to and parse errors correctly. End to End tests are pretty darn capable, but typically you'll run these in a non-production environment (production-like, but not production) to trade-off that confidence for practicality.

테스트 전략에서 레벨 분류보다 중요한 판단 기준은 무엇인가?

유닛이냐 통합이냐 같은 이름 분류보다, "내가 코드를 배포할 때 비즈니스 요구사항이 충족된다는 확신을 얼마나 갖는가"가 핵심이다. 테스트를 작성하는 가장 크고 중요한 이유는 바로 그 확신(confidence)이다.

In the end I don't really care about the distinctions. If you want to call my unit tests integration tests or even E2E tests (as some people have) then so be it. What I'm interested in is whether I'm confident that when I ship my changes, my code satisfies the business requirements and I'll use a mix of the different testing strategies to accomplish that goal.

The biggest and most important reason that I write tests is CONFIDENCE. I want to be confident that the code I'm writing for the future won't break the app that I have running in production today. So whatever I do, I want to make sure that the kinds of tests I write bring me the most confidence possible and I need to be cognizant of the trade-offs I'm making when testing.