Amaç ve Kapsam: Tanımlanan veri seti üzerinde konut fiyatlarının tahminlenmesi için Spark ML kütüphanesi kullanılarak PySpark ile bir regresyon modeli oluşturulacaktır.

Veri kümesi: California Housing Prices https://www.kaggle.com/datasets/camnugent/california-housing-prices

Ortam: Proje kaggle’de bulunan notebook ortamında python ve temelinde pyspark kütüphanesi ile yapılmıştır.

BAŞLANGIÇ: & Veri Yükleme:

% pip install pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import pandas as pd
import numpy as np
# Spark session başlatma
spark = SparkSession.builder \\
    .appName("California Housing Prices Regression") \\
    .getOrCreate()
# Veri setini yükleme
file_path = "/kaggle/input/california-housing-prices/housing.csv"
housing_data = spark.read.csv(file_path, header=True, inferSchema=True)
SparkSession - in-memory

SparkContext

Spark UI

Versionv3.5.1
Masterlocal[*]
AppNameCalifornia Housing Prices Regression

Keşifşel Veri Analizi:

Ön Veri İnceleme:

housing_data.show(5)

+---------+--------+------------------+-----------+--------------+----------+----------+-------------+------------------+---------------+
|longitude|latitude|housing_median_age|total_rooms|total_bedrooms|population|households|median_income|median_house_value|ocean_proximity|
+---------+--------+------------------+-----------+--------------+----------+----------+-------------+------------------+---------------+
|  -122.23|   37.88|              41.0|      880.0|         129.0|     322.0|     126.0|       8.3252|          452600.0|       NEAR BAY|
|  -122.22|   37.86|              21.0|     7099.0|        1106.0|    2401.0|    1138.0|       8.3014|          358500.0|       NEAR BAY|
|  -122.24|   37.85|              52.0|     1467.0|         190.0|     496.0|     177.0|       7.2574|          352100.0|       NEAR BAY|
|  -122.25|   37.85|              52.0|     1274.0|         235.0|     558.0|     219.0|       5.6431|          341300.0|       NEAR BAY|
|  -122.25|   37.85|              52.0|     1627.0|         280.0|     565.0|     259.0|       3.8462|          342200.0|       NEAR BAY|
+---------+--------+------------------+-----------+--------------+----------+----------+-------------+------------------+---------------+
only showing top 5 rows