File I/O with NumPy
25 minNumPy provides functions to save and load arrays from files, enabling data persistence, sharing, and efficient storage. File I/O is essential for saving computation results, loading datasets, and exchanging data between programs. NumPy's file formats are optimized for array data. Understanding file I/O enables data management. File I/O is fundamental to data science workflows.
Supported formats include .npy (single array binary), .npz (multiple arrays compressed), text files (CSV, space-delimited), and binary files. Each format has advantages: .npy/.npz are fast and preserve data types, text files are human-readable, binary files are efficient. Understanding format differences enables choosing appropriate storage. Format selection affects performance and compatibility.
.npy format stores single arrays in efficient binary format, preserving data types and array structure. .npy files are fast to read/write and preserve precision. They're ideal for saving individual arrays. Understanding .npy format enables efficient single-array storage. .npy is the standard NumPy format.
.npz format stores multiple arrays in compressed format, enabling saving multiple arrays in one file. .npz files use ZIP compression, reducing file size. Arrays are stored with names for easy access. Understanding .npz format enables efficient multi-array storage. .npz is ideal for saving related arrays together.
Text file I/O (np.savetxt(), np.loadtxt()) enables human-readable storage and compatibility with other tools. Text files are slower but portable. You can specify delimiters, formats, and headers. Understanding text I/O enables data exchange. Text files are essential for compatibility.
Best practices include using .npy/.npz for NumPy-specific data (fast, preserves types), using text files for compatibility, specifying formats and delimiters for text files, handling errors appropriately, and understanding file size implications. Understanding file I/O enables effective data management. File I/O is essential for data science.
Key Concepts
- NumPy provides functions to save and load arrays from files.
- Supported formats: .npy, .npz, text files, binary files.
- .npy format stores single arrays efficiently.
- .npz format stores multiple arrays with compression.
- Text file I/O enables human-readable storage.
Learning Objectives
Master
- Saving and loading arrays with .npy and .npz formats
- Using text file I/O for compatibility
- Understanding format differences and when to use each
- Handling file I/O errors appropriately
Develop
- Understanding data persistence strategies
- Designing efficient data storage workflows
- Appreciating file I/O's role in data science
Tips
- Use .npy for single arrays—it's fast and preserves types.
- Use .npz for multiple arrays—it compresses and organizes.
- Use text files (savetxt/loadtxt) for compatibility with other tools.
- Specify delimiters and formats for text files.
Common Pitfalls
- Not understanding format differences, choosing wrong format.
- Not preserving data types, losing precision.
- Not handling file errors, causing crashes.
- Using text files for large arrays, causing slow I/O.
Summary
- NumPy provides functions to save and load arrays from files.
- Different formats serve different purposes (.npy, .npz, text).
- Understanding file I/O is important for data persistence.
- Format selection affects performance and compatibility.
- File I/O is essential for data science workflows.
Exercise
Save and load NumPy arrays using different formats.
import numpy as np
# Create sample arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
# Save single array
np.save('array.npy', arr1)
print("Saved array to array.npy")
# Load single array
loaded_arr = np.load('array.npy')
print("Loaded array:", loaded_arr)
# Save multiple arrays
np.savez('arrays.npz', arr1=arr1, arr2=arr2)
print("Saved multiple arrays to arrays.npz")
# Load multiple arrays
loaded_data = np.load('arrays.npz')
print("Loaded arrays:")
print("arr1:", loaded_data['arr1'])
print("arr2:")
print(loaded_data['arr2'])
# Save to text file
np.savetxt('array.txt', arr2)
print("Saved array to text file")
# Load from text file
loaded_text = np.loadtxt('array.txt')
print("Loaded from text file:")
print(loaded_text)
# Save with specific format
np.savetxt('array_formatted.txt', arr2, fmt='%.2f', delimiter=',')
print("Saved with custom format")
# Load with specific parameters
loaded_formatted = np.loadtxt('array_formatted.txt', delimiter=',')
print("Loaded with custom parameters:")
print(loaded_formatted)
# Save complex array
complex_arr = np.array([1+2j, 3+4j, 5+6j])
np.save('complex_array.npy', complex_arr)
print("Saved complex array")
# Load complex array
loaded_complex = np.load('complex_array.npy')
print("Loaded complex array:", loaded_complex)