* Numerical Python - Numpy

💡 아래 내용은 부스트코스(boostcourse) 인공지능(AI) 기초 다지기 강의를 듣고 공부하며 내용을 정리한 글입니다. 더 자세한 내용은 실제 강의를 들어보길 추천합니다 😃

🐍 Numpy(Numerical Python)란?

- 파이썬 과학 처리 패키지

- 파이썬의 고성능 과학 계산용 패키지

- Matrix와 Vector와 같은 Array 연산의 사실상의 표준

특징

- 일반 List에 비해 빠르고, 메모리 효율적

- 반복문 없이 데이터 배열에 대한 처리를 지원함

- 선형대수와 관련된 다양한 기능을 제공함

- C, C++, 포트란 등의 언어와 통합 가능

📝 Contents

numpy
ndarray
Handling shape
Indexing
Slicing
Creation function
Operation functions
array operations
Comparisons
Boolean Index
Fancy Index
numpy data i/o

numpy 모듈의 호출¶

In [1]:

import numpy as np

array의 생성¶

numpy는 np.array 함수를 활용하여 배열을 생성 -> ndarray
numpy는 하나의 데이터 type만 배열에 넣을 수 있음
List와 가장 큰 차이점, Dynamic typing not supported

In [3]:

test_array = np.array(["1", "4", 5.55, 8], np.float32)
print(test_array)
print(type(test_array[3]))
print(test_array.shape)
# float64 => 하나의 element가 차지하는 메모리 공간이 64bit라는 뜻
# 32bits => 4Bytes / 64bits => 8bytes / 8bits => 1bytes

[1.   4.   5.55 8.  ]
<class 'numpy.float32'>
(4,)

In [4]:

test_array = np.array([["1", "4", 5.55, 8]], np.float32)
test_array.shape

Out[4]:

(1, 4)

In [5]:

test_array.dtype    # dtype: np.array의 데이터 type 반환

Out[5]:

dtype('float32')

In [6]:

# shape: np.array의 object의 dimension 구성을 반환
test_array.shape    # # one dimensional, tuple type

Out[6]:

(1, 4)

In [7]:

# 3rd order tensor
tensor = [[[1,2,3,4], [1,2,3,4], [1,2,3,4]],
          [[1,2,3,4], [1,2,3,4], [1,2,3,4]],
          [[1,2,3,4], [1,2,3,4], [1,2,3,4]],
          [[1,2,3,4], [1,2,3,4], [1,2,3,4]]]
# ndarray의 shape(type: tuple)
# tensor의 깊이, row, column
np.array(tensor, int).shape

Out[7]:

(4, 3, 4)

😎 Handling shape

reshape¶

element 개수 동일, shape의 크기만 변경
-1 : size를 기반으로 row 개수 선정

flatten¶

다차원 array를 1차원 array로 변환

😎 Indexing & slicing

slicing¶

데이터의 일부분만 가져올 때 사용
ex) [:2, :] = row 0~1까지, Column 전체
x:y:z => x = 시작 지점, y = 끝나는 지점, z = step
[:, ::2] 2칸씩 띄운 column 선택

😎 creation function

arrange¶

In [8]:

# List의 range와 같은 효과, integer로 0 ~ 29까지 배열 추출
np.arange(30)

Out[8]:

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [9]:

# reshape과 같이 유용하게 쓰임
np.arange(30).reshape(-1, 5)

Out[9]:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

In [10]:

# floating point도 표시 가능
# (시작, 끝, step)
np.arange(0, 5, 0.5)

Out[10]:

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [11]:

# list 형태로 써주기
np.arange(0, 5, 0.5).tolist()

Out[11]:

[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]

ones, zeros and empty¶

zeros = 0으로 가득찬 ndarray 생성 np.zeros(shape, dtype, order)
ones = 1로 가득찬 ndarray 생성 np.ones(shape, dtype, order)
empty = shape만 주어지고 비어있는 ndarray 생성 (memory initialization이 되지 않음) - 잘 쓰지 x

In [12]:

# 10 - zero vector 생성
np.zeros(shape=(10,), dtype=np.int8)

Out[12]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int8)

In [13]:

# 2 by 5 - zero matrix 생성
np.zeros((2,5))

Out[13]:

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [14]:

np.ones(shape=(10,), dtype=np.int8)

Out[14]:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int8)

In [15]:

np.ones((2,5))

Out[15]:

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [16]:

np.empty(shape=(10,), dtype=np.int8)

Out[16]:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int8)

In [17]:

# 메모리 공간은 잡고, 초기화 시켜주지 않음
np.empty((3,5))

Out[17]:

array([[0.        , 0.        , 0.4472136 , 0.0531494 , 0.18257419],
       [0.4472136 , 0.2125976 , 0.36514837, 0.4472136 , 0.4783446 ],
       [0.54772256, 0.4472136 , 0.85039041, 0.73029674, 0.4472136 ]])

something_like¶

기존 ndarray의 shape 크기 만큼 1, 0 또는 empty array를 반환

In [18]:

test_matrix = np.arange(30).reshape(5,6)
np.ones_like(test_matrix)

Out[18]:

array([[1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1]])

identity¶

단위 행렬(i 행렬)을 생성함
n -> number of rows

In [19]:

np.identity(n=3, dtype=np.int8)

Out[19]:

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int8)

In [20]:

np.identity(5)

Out[20]:

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

eye¶

대각선인 1인 행렬, k값의 시작 index의 변경이 가능

In [21]:

np.eye(N=3, M=5, dtype=np.int8)

Out[21]:

array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0]], dtype=int8)

In [22]:

np.eye(3)

Out[22]:

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [23]:

np.eye(3,5,k=2)    # k -> start index

Out[23]:

array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

diag¶

대각 행렬의 값을 추출함

In [24]:

matrix = np.arange(9).reshape(3,3)
np.diag(matrix)

Out[24]:

array([0, 4, 8])

In [25]:

np.diag(matrix, k=1)    # k -> start index

Out[25]:

array([1, 5])

random sampling¶

데이터 분포에 따른 sampling으로 array를 생성

In [26]:

np.random.uniform(0,1,10).reshape(2,5)    # 균등분포

Out[26]:

array([[0.21408305, 0.98803664, 0.65395595, 0.53426039, 0.92799067],
       [0.01244071, 0.4638389 , 0.9239366 , 0.88648509, 0.72340139]])

In [27]:

np.random.normal(0,1,10).reshape(2,5)    # 정규분포

Out[27]:

array([[ 0.15540571, -0.23307988,  1.84257156,  1.73167696,  0.19199975],
       [ 0.87701912,  0.19438749,  0.65091444, -0.17108458, -0.42799948]])

😎 operation function

sum¶

ndarray의 element들 간의 합을 구함, list의 sum 기능과 동일

In [28]:

test_array = np.arange(1,11)
test_array

Out[28]:

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [29]:

test_array.sum(dtype=float)

Out[29]:

55.0

axis 🌟¶

모든 operation function을 실행할 때, 기준이 되는 dimension 축

In [30]:

test_array = np.arange(1,13).reshape(3,4)
test_array

Out[30]:

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [31]:

# (항상 새로 생기는 shape)axis=0 세로 기준, axis=1 가로 기준
test_array.sum(axis=0), test_array.sum(axis=1)

Out[31]:

(array([15, 18, 21, 24]), array([10, 26, 42]))

mean & std¶

ndarray의 element들 간의 평균 또는 표준 편차를 반환

In [32]:

test_array = np.arange(1, 13).reshape(3,4)
test_array

Out[32]:

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [33]:

test_array.mean(), test_array.mean(axis=0)

Out[33]:

(6.5, array([5., 6., 7., 8.]))

In [34]:

test_array.std(), test_array.std(axis=0)    # 표준 편차

Out[34]:

(3.452052529534663, array([3.26598632, 3.26598632, 3.26598632, 3.26598632]))

Mathematical functions¶

그 외에도 다양한 수학 연산자를 제공

In [35]:

np.exp(test_array)

Out[35]:

array([[2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01],
       [1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03],
       [8.10308393e+03, 2.20264658e+04, 5.98741417e+04, 1.62754791e+05]])

In [36]:

np.sqrt(test_array)

Out[36]:

array([[1.        , 1.41421356, 1.73205081, 2.        ],
       [2.23606798, 2.44948974, 2.64575131, 2.82842712],
       [3.        , 3.16227766, 3.31662479, 3.46410162]])

concatenate¶

Numpy array를 합치는 함수
vstack, hstack

In [37]:

# 2개의 vector 합치기
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.vstack((a,b))    

Out[37]:

array([[1, 2, 3],
       [2, 3, 4]])

In [38]:

a = np.array([[1], [2], [3]])
b = np.array([[2], [3], [4]])
np.hstack((a,b))

Out[38]:

array([[1, 2],
       [2, 3],
       [3, 4]])

In [39]:

# 축을 기준으로 합치기
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.concatenate((a,b), axis=0)   

Out[39]:

array([1, 2, 3, 2, 3, 4])

In [40]:

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
np.concatenate((a,b.T), axis=1)

Out[40]:

array([[1, 2, 5],
       [3, 4, 6]])

😎 array operations

Operations b/t arrays 🌟¶

Numpy는 array 간의 기본적인 사칙 연산을 지원함

In [41]:

test_a = np.array([[1,2,3],[4,5,6]], float)

In [42]:

test_a + test_a    # Matrix + Matrix 연산

Out[42]:

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

In [43]:

test_a - test_a    # Matrix - Matrix 연산

Out[43]:

array([[0., 0., 0.],
       [0., 0., 0.]])

In [44]:

test_a * test_a    # Matrix 내 element들 간 같은 위치에 있는 값들끼리 연산

Out[44]:

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

Element-wise operations¶

array간 shape이 같을 때 일어나는 연산

In [45]:

matrix_a = np.arange(1,13).reshape(3,4)
matrix_a * matrix_a

Out[45]:

array([[  1,   4,   9,  16],
       [ 25,  36,  49,  64],
       [ 81, 100, 121, 144]])

Dot product¶

-matrix의 기본 연산

In [46]:

test_a = np.arange(1,7).reshape(2,3)
test_b = np.arange(7,13).reshape(3,2)
test_a.dot(test_b)

Out[46]:

array([[ 58,  64],
       [139, 154]])

transpose¶

transpose 또는 T attribute 사용

In [47]:

test_a = np.arange(1,7).reshape(2,3)
test_a

Out[47]:

array([[1, 2, 3],
       [4, 5, 6]])

In [48]:

test_a.T.dot(test_a)    # Matrix 간 곱셈

Out[48]:

array([[17, 22, 27],
       [22, 29, 36],
       [27, 36, 45]])

In [49]:

test_a.T

Out[49]:

array([[1, 4],
       [2, 5],
       [3, 6]])

In [50]:

test_a.transpose()

Out[50]:

array([[1, 4],
       [2, 5],
       [3, 6]])

broadcasting 🌟¶

shape이 다른 배열 간 연산을 지원하는 기능

In [51]:

test_matrix = np.array([[1,2,3],[4,5,6]], float)
scalar = 3

In [52]:

test_matrix + scalar    # Matrix - Scalar 덧셈

Out[52]:

array([[4., 5., 6.],
       [7., 8., 9.]])

In [53]:

test_matrix - scalar    # Matrix - Scalar 뺄셈

Out[53]:

array([[-2., -1.,  0.],
       [ 1.,  2.,  3.]])

In [54]:

test_matrix * 5    # Matrix - Scalar 곱셈

Out[54]:

array([[ 5., 10., 15.],
       [20., 25., 30.]])

In [55]:

test_matrix / 5    # Matrix - Scalar 나눗셈

Out[55]:

array([[0.2, 0.4, 0.6],
       [0.8, 1. , 1.2]])

In [56]:

test_matrix // 0.2    # Matrix - Scalar 몫

Out[56]:

array([[ 4.,  9., 14.],
       [19., 24., 29.]])

In [57]:

test_matrix ** 2    # Matrix - Scalar 제곱

Out[57]:

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

Numpy performance¶

timeit: jupyter 환경에서 코드의 퍼포먼스를 체크하는 함수
일반적으로 속도는 for loop < list comprehension < numpy
100,000,000번의 loop이돌 때 약 4배 이상의 성능 차이를 보임
Numpy는 C로 구현되어 있어, 성능을 확보하는 대신
파이썬의 가장 큰 특징인 dynamic typing을 포기함
대용량 계산에서는 가장 흔히 사용됨
Concatenate 처럼 계산이 아닌, 할당에서는 연산 속도의 이점이 없음

In [58]:

# def sclar_vector_product(scalar, vector):
#     result = []
#     for value in vector:
#         result.append(scalar * value)
#     return result

# iternation_max = 100000000

# vector = list(range(iternation_max))
# scalar = 2

# %timeit sclar_vector_product(scalar, vector) # for loop을 이용한 성능
# %timeit [scalar * value for value in range(iternation_max)] # list comprehension을 이용한 성능
# %timeit np.arange(iternation_max) * scalar # numpy를 이용한 성능

😎 comparisons

All & Any 🌟¶

array의 데이터 전부(and) 또는 일부(or)가 조건에 만족 여부 반환

In [59]:

a = np.arange(10)
a

Out[59]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [60]:

# Numpy는 배열의 크기가 동일할 때
# element간 비교의 결과를 Boolean type으로 반환하여 돌려줌 !!
a>5

Out[60]:

array([False, False, False, False, False, False,  True,  True,  True,
        True])

In [61]:

np.any(a>5), np.any(a<0)

Out[61]:

(True, False)

In [62]:

np.all(a>5), np.all(a<10)

Out[62]:

(False, True)

In [63]:

# logical (잘 안씀)
# and 조건의 condition
a = np.array([1, 3, 0], float)
np.logical_and(a > 0, a < 3)

Out[63]:

array([ True, False, False])

In [64]:

# NOT 조건의 condition
b = np.array([True, False, True], bool)
np.logical_not(b)

Out[64]:

array([False,  True, False])

In [65]:

# OR 조건의 condition
c = np.array([False, True, False], bool)
np.logical_or(b, c)

Out[65]:

array([ True,  True,  True])

np.where¶

where(condition, TRUE, FALSE)

In [66]:

a = np.array([1, 3, 0], float)
a

Out[66]:

array([1., 3., 0.])

In [67]:

# Index 값 반환 + 정렬과 같이 많이 쓰임
np.where(a>0)

Out[67]:

(array([0, 1]),)

In [68]:

np.where(a > 0, 3, 2)

Out[68]:

array([3, 3, 2])

In [69]:

a = np.arange(10)
np.where(a>5)

Out[69]:

(array([6, 7, 8, 9]),)

In [70]:

# Not a Number
a = np.array([1, np.NaN, np.Inf], float)
np.isnan(a)

Out[70]:

array([False,  True, False])

In [71]:

# is finite number
np.isfinite(a)

Out[71]:

array([ True, False, False])

argmax & argmin¶

array 내 최대값 또는 최소값의 index를 반환함

In [72]:

a = np.array([1,2,4,5,8,78,23,3])
np.argmax(a), np.argmin(a)

Out[72]:

(5, 0)

In [73]:

# axis 기반의 index 반환
a = np.array([[1,2,4,7],[9,88,6,45],[9,76,3,4,]])
np.argmax(a, axis=1), np.argmin(a, axis=0)

Out[73]:

(array([3, 1, 1]), array([0, 0, 2, 2]))

😎 boolean & fancy index

boolean index¶

numpy는 배열은 특정 조건에 따른 값을 배열 형태로 추출할 수 있음
Comparison operation 함수들도 모두 사용 가능

In [74]:

test_array = np.array([1, 4, 0, 2, 3, 8, 9, 7], float)
test_array > 3

Out[74]:

array([False,  True, False, False, False,  True,  True,  True])

In [75]:

# 조건이 True인 index의 element만 추출
test_array[test_array > 3]

Out[75]:

array([4., 8., 9., 7.])

In [76]:

condition = test_array < 3
test_array[condition]

Out[76]:

array([1., 0., 2.])

In [77]:

A = np.array([1, 2, 3, 4, 5])
B = A < 3
B

Out[77]:

array([ True,  True, False, False, False])

In [78]:

B.astype(int)

Out[78]:

array([1, 1, 0, 0, 0])

fancy index¶

numpy는 array를 index value로 사용해서 값을 추출하는 방법

In [79]:

a = np.array([2, 4, 6, 8], float)
b = np.array([0, 0, 1, 3, 2, 1], int)    # 반드시 integer로 선언
a[b]    # braket index, b 배열의 값을 index로 하여 a의 값들을 추출함

Out[79]:

array([2., 2., 4., 8., 6., 4.])

In [80]:

a.take(b)    # take 함수(권장): bracket index와 같은 효과

Out[80]:

array([2., 2., 4., 8., 6., 4.])

In [81]:

# Matrix 형태의 데이터도 가능
a = np.array([[1, 4], [9, 16]], float)
b = np.array([0, 0, 1, 1, 0], int)
c = np.array([0, 1, 1, 1, 1], int)
a[b,c]    # b를 row index, c를 column index로 변환하여 표시함

Out[81]:

array([ 1.,  4., 16., 16.,  4.])

😎 numpy data i/o

loadtxt & savetxt¶

Text type의 데이터를 읽고, 저장하는 기능

In [82]:

# a = np.loadtxt('경로')
a[:10]

Out[82]:

array([[ 1.,  4.],
       [ 9., 16.]])

In [83]:

# type 변환
a_int = a.astype(int)
a_int[:3]

Out[83]:

array([[ 1,  4],
       [ 9, 16]])

In [84]:

# 데이터 저장
np.savetxt('int_data.csv', a_int, delimiter=",")

numpy object - npy¶

numpy object (pickle) 형태로 데이터를 저장하고 불러옴
Binary 파일 형태로 저장함 !

In [85]:

np.save("npy_test", arr=a_int)

In [86]:

npy_array = np.load(file="npy_test.npy")
npy_array[:3]

Out[86]:

array([[ 1,  4],
       [ 9, 16]])

+ 주피터 노트북에서 실습한 파일을 블로그에 넣고 싶다면 File -> Download as -> HTML로 HTML 모드로 넣어주면 된다.

[참고 : https://dailyheumsi.tistory.com/37]

실습 코드는 직접 해보시는 걸 추천합니다!

저작자표시 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

Python 소수점 다루기 - math, int(), //1 (0)	2023.08.16
[인공지능(AI) 기초 다지기] CSV, WEB, XML, JSON 4가지 데이터 형식 개념 (0)	2023.01.31
[인공지능(AI) 기초 다지기] 객체 지향 언어의 이해 (0)	2023.01.30
[인공지능(AI) 기초 다지기] Pythonic Code 파이썬다운 코드를 작성해보자! - list comprehension, enumerate, zip, lambda, generator, asterisk(*) .. (2)	2023.01.23
[인공지능(AI) 기초 다지기] Python 자료 구조(Data Structure) : Stack & Queue, Dict, Collections (0)	2023.01.19

Rachel의 개발 기록

[인공지능(AI) 기초 다지기] Numpy의 여러 기능 / 실습 코드

🐍 Numpy(Numerical Python)란?

특징

📝 Contents

numpy 모듈의 호출¶

array의 생성¶

reshape¶

flatten¶

slicing¶

arrange¶

ones, zeros and empty¶

something_like¶

identity¶

eye¶

diag¶

random sampling¶

sum¶

axis 🌟¶

mean & std¶

Mathematical functions¶

concatenate¶

Operations b/t arrays 🌟¶

Element-wise operations¶

Dot product¶

transpose¶

broadcasting 🌟¶

Numpy performance¶

All & Any 🌟¶

np.where¶

argmax & argmin¶

boolean index¶

fancy index¶

loadtxt & savetxt¶

numpy object - npy¶

'프로그래밍 > Python' 카테고리의 다른 글

티스토리툴바

[인공지능(AI) 기초 다지기] Numpy의 여러 기능 / 실습 코드

🐍 Numpy(Numerical Python)란?

특징

📝 Contents

numpy 모듈의 호출¶

array의 생성¶

reshape¶

flatten¶

slicing¶

arrange¶

ones, zeros and empty¶

something_like¶

identity¶

eye¶

diag¶

random sampling¶

sum¶

axis 🌟¶

mean & std¶

Mathematical functions¶

concatenate¶

Operations b/t arrays 🌟¶

Element-wise operations¶

Dot product¶

transpose¶

broadcasting 🌟¶

Numpy performance¶

All & Any 🌟¶

np.where¶

argmax & argmin¶

boolean index¶

fancy index¶

loadtxt & savetxt¶

numpy object - npy¶

'프로그래밍 > Python' 카테고리의 다른 글

관련글

티스토리툴바