01 Oct How to import multiple files in Python
Most times in Python, you get to import just one file using pandas by pd.read(filename) or using the default open() and read() function in. But news flash, you can actually do more!! In this article, I am going to show you how to import multiple files into your Python IDE.
Please note that the IDE I used for this process is Jupyter notebook.
Pandas can be used to read certain file types as specified in jupyter notebook. These file types include:
1. clipboard
2. Csv
3. excel
4. Feather
5. Fwf
6. Gbq
7. Hdf
8. Html
9. Json
10. Msgpack
11. Parquet
12. Sas
13. Sql
14. Sql query
15. Sql table
16. Stata
17. Table
To see this list in your jupyter notebook. This is all you have to do.
1. Be sure you have pandas installed
Pip install pandas
2. Import pandas into your jupyter notebook
Import pandas as pd
3. Try to read your file and check for other file formats that can be read in python
Data = pd.read_#fileformat(filename)
(#fileformat is just a place holder for the file format)
After the underscore(_) press the tab key on your keyboard.
Importing multiple files in Python
Importing multiple files in python is done with a module called GLOB
Glob is a module that helps to import any file format into python notebook. It is used with several wildcards to import specific file types to prevent import unnecessary files not needed in your python notebook.
To get glob installed, you have to run a pip command in your command prompt or Anaconda Prompt
Pip install glob3
Importing glob into python (Anaconda)
Import glob
Importing all the file in your current directory
Myfiles = [I for in glob.glob(‘*’)] Note: * is a wildcard which denotes all. This takes all the files in that current directory into python.
Importing all excel file formats in Python
Myfiles = [I for in glob.glob(‘*.xlsx’)]
The above code can also be written in the default way as shown below:
Myfiles = [] (This is an empty list)
Creating the for loop
For each_file in glob.glob(‘*.xlsx’): Myfiles.append(each_file) Print(Myfiles)
This will give you all the excel files in your current directory.
On occasions where there are fields like data1.xlsx, data2.xlsx, data3.xlsx, data21.xlsx, data22.xlsx, e.t.c; we can use a wildcard (?) to pick certain files.
Mynewfiles = [] For each_file in glob.glob(‘data?.xlsx’): Mynewfiles.append(each_file) Print(Mynewfiles)
The code above will give us files with the name data1.xlsx, data2.xlsx, data3.xlsx without data21.xlsx, data22.xlsx even though it is an excel format (.xlsx)
Import files with a range of numbers
Mynewfiles2 = [] (This is an empty list) For each_file in glob.glob(‘data[0-9].xlsx’): Mynewfiles2.append(each_file) Print(Mynewfiles2)
This will return files with the numerical values in the specified location in the specified range.
Placeholders
Using placeholders can be fun just to make your codes more readable and understandable.
Combining {} and .format helps in achieving this.
Importing only csv files
csv_files = [] for each_file in glob.glob('*.{}'.format(‘csv’)): csv_files.append(each_file) print(csv_files)
Most times, it is preferred to have your file format assigned to a variable.
Importing an XML file
Fileformat = ‘xml’ xml_files = [] for each_file in glob.glob('*.{}'.format(fileformat)): xml_files.append(each_file) print(xml_files)
You can import any file format into python using the above method python.
To read these files, you can use the open and read functions in python as seen below:
for each_data in xml_files: print(open(each_data, 'r')) print(open(each_data, 'r').read())
This would return the contents of the file (the xml file).
No Comments