Spark Show Full Column Content in Spark Dataframe

Spark Show Full Column Content Without Truncation:



As a developer we need to code interactively, there in spark with Scala using spark-shell or spark with python (PySpark). While checking for the results interactively either in Scala or else in python, we often notice the column value populated in dataframe is displayed by truncating few words. In this chapter we are going to look into the simple Apache Spark trick that will be really helpful for BigData developers to view the complete column content present in the Spark show without truncation of the data.

Let's learn the code snippet to have a better understanding of this trick with help of one small illustration using PySpark. As a first step to start, create a dummy spark df with one column in which the text inside the column has length greater, which makes the column to get truncated while getting the output Spark dataframe displayed in Spark using show() function. For creating this dummy input dataframe and to replicate the spark column truncation issue, we will read a sample CSV file with one column in it as show below.

Input File:


Execution Video:



Program:

Open new Jupyter notebook. If you are new to Spark and need to have a setup of Spark in window's machine using Jupyter notebook, then complete the setup first and return back to this page. Create and entry point to access spark through spark session and code snippet for doing so is,

#Create an entry point
from pyspark.sql import SparkSession
spark = SparkSession.builder\
                    .master("local")\
                    .appName('fullcolumncontent')\
                    .getOrCreate()

Now, we can go ahead a create input dataframe by reading the CSV file. Follow the below snippet of code to create spark dataframe.

#Read the input csv file as data frame
input=spark.read.csv('input_file.csv',header='true')


Out[]:
DataFrame[Email_address: string]

Here, we could notice that the "input" created is dataframe with one column named Email_address of type string. To view the content in dataframe, type in the given snippet.

#Display the Dataframe output interactively
input.show()


We could notice output of the show() command will be a truncated email address as displayed here


Solution:

To view a full content of the column in a spark dataframe, try below lines of code

#Display the Dataframe output without truncation
input.show(truncate=0)


In Spark Scala, we can show without truncation using below snippet,

 df.show(false)

 where df is a dataframe.

Output:


Output of the above code will be as like


Hope you learnt a useful trick in Spark.  Don't forgot to stop SparkSession once done.



Try using this in your development and leave a comment if you face any challenges or issue while executing the above steps.

Happy Learning !!!

Post a Comment

0 Comments