SparkSql系列(5/25) case when 使用

5,680次阅读
没有评论

共计 1341 个字符,预计需要花费 4 分钟才能阅读完成。

case when 这种用法基本上每一类编程语言里都会有,scala 里面常见的就是 case 语法,也就是条件判断,可以想想一大堆不同条件下执行不同的语句。

首先创建一个 dataframe

import org.apache.spark.sql.functions.{when, _}
val spark: SparkSession = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExamples.com")
      .getOrCreate()

import spark.sqlContext.implicits._
val data = List(("James","","Smith","36636","M",60000),
        ("Michael","Rose","","40288","M",70000),
        ("Robert","","Williams","42114","",400000),
        ("Maria","Anne","Jones","39192","F",500000),
        ("Jen","Mary","Brown","","F",0))

val cols = Seq("first_name","middle_name","last_name","dob","gender","salary")
val df = spark.createDataFrame(data).toDF(cols:_*)

when otherwise

val df2 = df.withColumn("new_gender", when(col("gender") === "M","Male")
      .when(col("gender") === "F","Female")
      .otherwise("Unknown"))

上面这个例子就是新建一个列,when 判断了两次,当然也可以直接写个udf 搞定,udf是万能的。但是你也要写函数,然后注册,也挺累的。

case when

val df3 = df.withColumn("new_gender", 
      expr("case when gender = 'M' then 'Male' " +
                       "when gender = 'F' then 'Female' " +
                       "else 'Unknown' end"))

或者

val df4 = df.select(col("*"),
      expr("case when gender = 'M' then 'Male' " +
                       "when gender = 'F' then 'Female' " +
                       "else 'Unknown' end").alias("new_gender"))

&& and || 算子

上面这两个算子就是与和或,可以实现复合条件的判断

val dataDF = Seq(
      (66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4"
      )).toDF("id", "code", "amt")
dataDF.withColumn("new_column",
       when(col("code") === "a" || col("code") === "d", "A")
      .when(col("code") === "b" && col("amt") === "4", "B")
      .otherwise("A1"))
      .show()
正文完
请博主喝杯咖啡吧!
post-qrcode
 
admin
版权声明:本站原创文章,由 admin 2021-06-07发表,共计1341字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(没有评论)
验证码