가비지 컬렉션 기본 원리

Python은 두 가지 방식으로 메모리를 관리한다:

Reference Counting (주된 방식)
Generational Garbage Collection (순환 참조 처리)

Reference Counting

모든 Python 객체는 참조 횟수를 추적하는 카운터를 가진다.

import sys
 
# 객체 생성
x = []
print(sys.getrefcount(x))  # 2 (x와 getrefcount 인자로 전달될 때의 참조)
 
# 참조 추가
y = x
print(sys.getrefcount(x))  # 3
 
# 참조 제거
del y
print(sys.getrefcount(x))  # 2

참조 카운트가 증가하는 경우

def demonstrate_ref_counting():
    # 1. 변수 할당
    x = []  # refcount: 1
    
    # 2. 함수 인자로 전달
    def func(arg):
        pass
    func(x)  # refcount: 2 (일시적)
    
    # 3. 컨테이너에 저장
    lst = [x]  # refcount: 2
    
    # 4. 속성 할당
    class Sample:
        pass
    obj = Sample()
    obj.attr = x  # refcount: 3

참조 카운트가 감소하는 경우

# 1. 변수가 스코프를 벗어날 때
def scope_example():
    x = []
    return  # x의 refcount 감소
 
# 2. del 사용
x = []
del x  # refcount 감소
 
# 3. 다른 객체 할당
x = []
x = "new value"  # 원래 리스트의 refcount 감소

순환 참조 문제

순환 참조 예시

class Node:
    def __init__(self):
        self.next = None
 
def create_cycle():
    node1 = Node()
    node2 = Node()
    
    # 순환 참조 생성
    node1.next = node2
    node2.next = node1
    
    # 함수를 벗어나도 메모리가 해제되지 않음

약한 참조를 통한 해결

import weakref
 
class Node:
    def __init__(self):
        self.next_weak = None
    
    def set_next(self, next_node):
        # 약한 참조 사용
        self.next_weak = weakref.ref(next_node)
 
node1 = Node()
node2 = Node()
node1.set_next(node2)
node2.set_next(node1)
# 순환 참조가 발생하지 않음

Generational Garbage Collection

Python의 GC는 세 세대(generation)로 객체를 관리한다.

import gc
 
# GC 상태 확인
print(gc.get_count())  # 각 세대별 객체 수
print(gc.get_threshold())  # 각 세대별 임계값
 
# 수동으로 GC 실행
gc.collect()

GC 제어

# GC 비활성화 (메모리 사용량이 많은 작업에서 유용)
gc.disable()
 
# 중요한 작업
expensive_operation()
 
# GC 다시 활성화
gc.enable()
 
# 특정 세대만 수집
gc.collect(generation=2)

메모리 누수 디버깅

객체 추적

import tracemalloc
 
# 메모리 추적 시작
tracemalloc.start()
 
# 코드 실행
def potential_leak():
    large_list = [0] * 1000000
    return "Done"
 
potential_leak()
 
# 메모리 사용량 확인
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
 
print("[ Top 10 memory users ]")
for stat in top_stats[:10]:
    print(stat)

메모리 프로파일링

from memory_profiler import profile
 
@profile
def memory_heavy_function():
    large_list = [0] * 1000000
    del large_list
    another_list = [1] * 2000000
    return "Done"
 
memory_heavy_function()

성능 최적화 팁

1. 대용량 객체 처리

def process_large_data():
    # 제너레이터 사용으로 메모리 사용 최적화
    def generate_data():
        for i in range(1000000):
            yield i ** 2
    
    # 한 번에 전체 리스트를 만들지 않음
    for item in generate_data():
        process_item(item)

2. 캐시 관리

import functools
 
@functools.lru_cache(maxsize=128)
def expensive_computation(n):
    # 결과가 캐시됨
    return sum(i * i for i in range(n))

주의사항

명시적 메모리 해제

# 큰 객체 사용 후 명시적으로 해제
large_data = process_huge_file()
result = compute_result(large_data)
del large_data  # 메모리 즉시 해제 가능

순환 참조 방지

class Parent:
    def __init__(self):
        self.children = []
    
    def add_child(self, child):
        self.children.append(child)
 
class Child:
    def __init__(self, parent):
        self.parent = weakref.ref(parent)  # 약한 참조 사용

결론

효율적인 메모리 관리를 위해서는 다음 사항들을 고려해야 한다:

Reference counting의 동작 방식 이해
순환 참조 상황 인지 및 방지
적절한 시점의 GC 제어
메모리 사용량 모니터링
대용량 데이터 처리 시 최적화 기법 활용

🚀 Taegi Taeknology

탐색기

가비지 컬렉터와 메모리 관리

가비지 컬렉션 기본 원리

Reference Counting

참조 카운트가 증가하는 경우

참조 카운트가 감소하는 경우

순환 참조 문제

순환 참조 예시

약한 참조를 통한 해결

Generational Garbage Collection

GC 제어

메모리 누수 디버깅

객체 추적

메모리 프로파일링

성능 최적화 팁

1. 대용량 객체 처리

2. 캐시 관리

주의사항

결론

그래프 뷰

목차

최근 게시글

OCSF(Open Cybersecurity Schema Framework)

허니팟

Django SQL 쿼리 디버깅