How to Split Large Files by size in Python?
Sometimes we want to read a big file and write it to the destination, but because of its size, we can't read it all at once. Chunking large files by size is a common task in data processing, especially when dealing with big data that cannot fit in memory. In the below example, we split a large file into smaller files, each containing a fixed number of bytes or a fixed number of lines.
The following program splits a large file into smaller files of 1 MB each:
# File location
file_location = "myinput.txt"
# File to open and break apart
fileR = open(file_location, "rb")
chunk = 0
chunk_size = 1000000 #1MB in bytes
byte = fileR.read(chunk_size)
while byte:
# Open a temporary file and write a chunk of bytes
fileN = "chunk" + str(chunk) + ".txt"
fileT = open(fileN, "wb")
fileT.write(byte)
fileT.close()
# Read next chunk_size 1024 bytes
byte = fileR.read(chunk_size)
chunk += 1