Python in Data Science
(Note: This is a work in progress and new information is being added.)
Input / Output
- Open, read and write .csv or .txt file
- Pandas: Read and write .csv or txt file into a DataFrame
- Write output to a .txt file in a for loop
Install
- Install library or module using pip
- Check version of installed library
Basics
- Pandas: Create a Series
- Pandas: Create a Series with custom index
- Pandas: Create a Series using a list
- Pandas: Create a Series using a dictionary
- Pandas: Create a Series using a tuple
- Pandas: Create a DataFrame
- Pandas: Create a DataFrame using random values
- Pandas: Create a DataFrame using a Series
- Pandas: Create a DataFrame using a list
- Pandas: Create a DataFrame using an array
- Pandas: Create a DataFrame using a dictionary
- Pandas: Create a DataFrame using tuple
- Pandas: Create a DataFrame using for_loop
- Pandas: Create a DataFrame with custom index
- Pandas: Create a DataFrame with date and time as index
- Pandas: Create a DataFrame with MultiIndex for rows
- Pandas: Create a DataFrame with MultiIndex for columns
- Pandas: Create a DataFrame with NaN, None, NaT
- Pandas: Update a DataFrame from another DataFrame
- Pandas: Get shape, size, type, dtypes of a DataFrame
- Pandas: Sort a DataFrame
- Numpy: Create 1D, 2D, 3D arrays with random values
- Numpy: Create an array with evenly spaced values
- Numpy: Create an identity array
- Numpy: Create a matrix
- Numpy: Delete row or column from an array
- Numpy: Numerical ranges: linspace, arange
- Pandas: Convert Series to numpy array
- Pandas: Convert DataFrame to numpy array
- Pandas: Pivot a DataFrame
- Pandas: Stack a DataFrame, unstack a Series
- Pandas: Merge or join DataFrames
- Pandas: Shift or lead or lag
- Pandas: Split a DataFrame column
- Pandas: Convert dtype of a column
- Pandas: Count NaN’s in a DataFrame
- Pandas: Check memory occupied by DataFrame
- Comment in python
- Nested list comprehension
- Numpy: np.where()
- Check if an item exists in a list
- Get length of a list
- Create a tuple
- Create a set
- Create a dictionary
- Create an empty dictionary
- Create a dictionary using tuples
- Dictionary comprehension
- For loop
- Exit a for loop
- Enumerate in for loop
- Zip in for loop
- For loop in reverse order
- For loop in sorted order
- Get data type
- Arithmetic operators
- Comparison operator
- Logical and bitwise operators
- Create a list
- Create an empty list
- Append item to a list
- Extend a list
- Insert an item in a list
- Remove an item from a list
- Pop item from a list
- Clear a list
- Remove / delete an item from a list by value
- Delete items from a list by index
- Count repeats of an item in a list
- Sort a list
- Reverse a list
- Iterate over a list
- Chain lists
- Create copy of a list
- List comprehension
- Numpy: Stack arrays
- Numpy: Concatenate arrays
- Numpy: Split arrays
- Numpy: Slice an array
- Numpy: Delete items from array
- Numpy: Get shape and size of an array
- Numpy: Get array data type
- Numpy: Create an array
- Numpy: Create a blank and empty array
- Numpy: Create an array of zeros
- Numpy: Create an array of ones
- Numpy: Full array
- Numpy: Create copy of an array
- Numpy: Sort an array
- Numpy: Transpose an array
- Numpy: Reshape an array
- Numpy: Append items to array
- Numpy: Insert items in an array
- any(), all()
- Create Timestamp and Localize
- Function
- while loop
- Count NaN in a list
- Numpy: Count NaN in an array
Mathematical and Aggregate Functions
- Numpy: Arithmetic functions
- Numpy: Aggregate functions: row-wise and column-wise
- Pandas: Aggregate function: row-wise and column-wise
- Pandas: Calculate cumulative sum and mean for a Series or DataFrame
- Pandas: Calculate rolling sum and mean for a Series or DataFrame
Date and Time
- Pandas: Create a timestamp
- Pandas: Create a date range
- Pandas: Create a date range of fixed frequency
- Pandas: Create a date range of random frequency
- Pandas: Create random dates
- Pandas: Add or subtract year, month, day, hour, minute, second from a timestamp
- Pandas: Convert string to a timestamp
- Pandas: Convert naive time to aware time i.e. add timezone to a timestamp
- Pandas: Resample time series
- Pandas: Group by using time series
- Pandas: Calculate difference between two timestamps
- Pandas: Shift or lead or lag time series
Data Visualization
- Matplotlib: Create a bar, scatter, and line plot
- Matplotlib: Create a box plot and histogram
- Seaborn: Create a bar, scatter, and line plot
- Seaborn: Create a box plot and histogram
- Seaborn: Create a scatter plot matrix
- Pandas: Create a scatter plot matrix
- Seaborn: Create a heatmap
- Matplotlib: Create a heatmap
- Matplotlib: Plot error bars
- Seaborn: Plot error bars
- Matplotlib: Save a plot as .png or .jpeg
- Matplotlib: Create subplots