SparkToPandas package
Submodules
SparkToPandas.pandas_plugin module
SparkToPandas Documentation
SparkToPandas is a simple plugin alongside of spark, the SparkToPandas was designed to work with pyspark with a syntax more similar to pandas.
- class SparkToPandas.pandas_plugin.spark_pandas(spark)[source]
Bases:
object
A supporting functions for pyspark ,which has the syntax similar to pandas
- barChart(df, x, y, hue, title, aspect='horizontal')[source]
Plots a barchart using the seaborn module
- Parameters
df – dataframe
x – str
y – str
hue – str
title – str
aspect – str
- Returns
None
- change_schema(df, columns, dataType)[source]
Function to change the schema of the table
- Parameters
df – dataframe
columns – list
dataType – list
- Returns
dataframe
- column_creator(df, primary_column, new_column_name, user_func)[source]
Creates a new column based on user defined function and returns the new rdd
- Parameters
df – dataframe
primary_column – str
new_column_name – str
user_func – function
- Returns
dataframe
- describe(df, col=None)[source]
Function to display the basic stats of the dataframe :param df: dataframe :param col: str :return: display attr
- drop_na(df, col_name=None)[source]
Drops null values based on user choice. Supports dropping all null values or dropping null values based on column subset
- Parameters
df – dataframe
col_name – str
- Returns
dataframe
- fillna(df, value, col_name=None)[source]
Fills null values based on user choice.
- Parameters
df – dataframe
value – int/str/float
col_name – str
- Returns
dataframe
- head(df, n)[source]
Prints the head and tail of the dataframe depending on user’s choice.
- Parameters
df – dataframe
n – int
- Returns
None
- print_schema(df)[source]
Function to print the schema of the table:
- Parameters
df – dataframe
- Returns
Schema
- read_csv(file_location, header=True)[source]
Function to read csv file as a spark rdd
- Parameters
file_location – str
header – bool
- Returns
rdd
- read_excel(file_location, sheet_name)[source]
Function to read excel sheet
- Parameters
file_location – str
sheet_name – str
- Returns
dataframe
- read_json(file_location)[source]
Function to read json data
- Parameters
file_location – str
- Returns
json obj