Hướng dẫn python aes encrypt file

Hướng dẫn python aes encrypt file
Python Encrypt File

Let’s use Python to Encrypt a File with AES! ✨✨ We are going to write some Python Code to encrypt a file on disk using AES and then decrypt the file to retrieve our original plaintext file. We will be using Python 3.8.10 for this example.

AES (Advanced Encryption Standard) was originally called Rijndael and is a symmetric block algorithm for encrypting or decrypting data. The standard was established by the U.S. National Institute of Standards and Technology (NIST) in 2001.

AES has a fixed block size of 128 bits (16 bytes) and has three different key lengths: 128, 192, or 256 bits long.

We are going to use 3 libraries in this example. Two of them may require installation which we can do as follows:

pip install hashlib
pip install pycryptodomex

The other module os should come built into Python 3. Now, let’s write our code:

from hashlib import md5
from Cryptodome.Cipher import AES
from os import urandom

def derive_key_and_iv(password, salt, key_length, iv_length): #derive key and IV from password and salt.
    d = d_i = b''
    while len(d) < key_length + iv_length:
        d_i = md5(d_i + str.encode(password) + salt).digest() #obtain the md5 hash value
        d += d_i
    return d[:key_length], d[key_length:key_length+iv_length]

def encrypt(in_file, out_file, password, key_length=32):
    bs = AES.block_size #16 bytes
    salt = urandom(bs) #return a string of random bytes
    key, iv = derive_key_and_iv(password, salt, key_length, bs)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    out_file.write(salt)
    finished = False

    while not finished:
        chunk = in_file.read(1024 * bs) 
        if len(chunk) == 0 or len(chunk) % bs != 0:#final block/chunk is padded before encryption
            padding_length = (bs - len(chunk) % bs) or bs
            chunk += str.encode(padding_length * chr(padding_length))
            finished = True
        out_file.write(cipher.encrypt(chunk))

def decrypt(in_file, out_file, password, key_length=32):
    bs = AES.block_size
    salt = in_file.read(bs)
    key, iv = derive_key_and_iv(password, salt, key_length, bs)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    next_chunk = ''
    finished = False
    while not finished:
        chunk, next_chunk = next_chunk, cipher.decrypt(in_file.read(1024 * bs))
        if len(next_chunk) == 0:
            padding_length = chunk[-1]
            chunk = chunk[:-padding_length]
            finished = True 
        out_file.write(bytes(x for x in chunk)) 


password = '12345' #shouldn't be something this simple

with open('infile.docx', 'rb') as in_file, open('outfile.docx', 'wb') as out_file:
    encrypt(in_file, out_file, password)

with open('outfile.docx', 'rb') as in_file, open('outfile_decrypted.docx', 'wb') as out_file:
    decrypt(in_file, out_file, password)

Let’s explain what is happening here:

  1. We import our libraries.
  2. You will need to prepare a file called infile.docx. This will be a Word document in this case (or any preferred file) with text in it that we can verify before and after the encryption/decryption process. Make sure it exists in the same folder as this python script.
  3. We define our first function derive_key_and_iv(). This function accepts the password, salt, key_length and iv_length. The password is user defined, the salt adds additional protection to the password, the key_length is the length of the encryption key in bytes and the iv_length is the length in bytes of our initialization vector. The Initialization Vector is a sequence of bytes unpredictable to potential hackers. It is as long as the block size (16 bytes in this case) and it ensures that distinct ciphertexts are produced even if the same plaintext is encrypted multiple times on different occasions with the same key. The encryption mode we are using (MODE_CBC) to encrypt our file requires an iv. Note that we used the MD5 algorithm to generate a reasonably strong key. The derive_key_and_iv() will return 2 values: the key and the iv (initialization vector). Both will be used to create the cipher that we will use to encrypt the file.
  4. We define our second function encrypt(). This function accepts 4 parameters:the in_file, out_file, password and key_length. The in_file is the plaintext file that we will encrypt (a Word Document in this case), the out_file will be a encrypted file that won’t be readable by anyone who doesn’t have the password, the password is user defined and the key_length has a default value of 32 bytes.
    • In this function we call the derive_key_and_iv() function which will return the key and iv which we will use along with the mode to create the cipher.
    • The mode we are using here is called MODE_CBC or Ciphertext Block Chaining. This mode of operation was invented in the 1970s and works by “XORing” each block of plaintext with the previous ciphertext block before being encrypted. The value cipher is a CBC cipher object
    • Recall that XOR means “exclusive or” which is a logical operation that is true if and only if its arguments are different i.e. one is true and the other is false.
    • in_file is read from the disk and encrypted in chunks or blocks which are multiples of the cipher block size. The final block is padded before encryption if it isn’t of adequate length. The encrypted file out_file is written to disk in chunks encrypted with our cipher.
    • The out_file will not be human readable because it is encrypted.
  5. We define our third function decrypt(). This function will accept 4 parameters: in_file, out_file, password and key_length. This function does the same thing as encrypt() except in reverse. The in_file will be the previously encrypted file and out_file with be a decrypted file that we can read.
  6. The password we define here is CRITICAL. In practice your password should be something more complex and isn’t easy to guess. We used a simple one for this example.
  7. We use a with statement to open the plaintext file for encryption and we do the same to decrypt it.

When the above code executes we will get 2 new files on disk in the same folder as our script in addition to infile.docx:

  1. outfile.docx – This file is the encryptedfile. This won’t be readable.
  2. outfile_decrypted.docx – This file is the decrypted file. This file should be identical to the original infile.docx file and all the contents should be the same and readable.

So that’s it! We have successfully encrypted and decrypted a file in Python. Word of caution though, this is only an example. In practice you should absolutely use stronger and more robust means to encrypt your data, along with a stronger password. Use this code at your own risk! This was for demonstration purposes only.

Thanks for reading! Check out our TwoFish Encryption Tutorial HERE. 👌👌👌

I love to share, educate and help developers. I have 14+ years experience in IT. Currently transitioning from Systems Administration to DevOps. Avid reader, intellectual and dreamer. Enter Freely, Go safely, And leave something of the happiness you bring.