PySpark Memo 3
Chapter 2
Content
- Filter Operation
- including: &, |, ==, ~
Codes:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('DataFrame').getOrCreate()
df_spark = spark.read().csv('test.csv', header=True, inferSchema=True)
# filter first format:
df_spark.filter('Salary<=2000')
# filter second format:
df_spark.filter(df_spark['Salary'] <= 2000)
# filter and
df_spark.filter((df_spark['Salary'] <= 2000) & (df_spark['Salary'] >= 1500))
# you can use or as |
# filter ~ is also the same