Posts

SCD Type 2 in MySQL/ coforge

 To implement SCD Type 2 in MySQL , you typically use SQL scripts to: Detect changes between source and target Expire old rows in the dimension table Insert new rows with updated values Assume two tables: customer_src (source/staging table) customer_dim (dimension table with SCD Type 2 tracking) CREATE TABLE customer_dim (     customer_sk INT AUTO_INCREMENT PRIMARY KEY,     customer_id INT,     name VARCHAR(100),     address VARCHAR(255),     start_date DATE,     end_date DATE,     is_current CHAR(1) -- 'Y' or 'N' ); Step 1: Identify Changes (New or Modified Data) -- Find records in source that are new or have changed CREATE TEMPORARY TABLE scd2_changes AS SELECT s.* FROM customer_src s LEFT JOIN customer_dim d   ON s.customer_id = d.customer_id AND d.is_current = 'Y' WHERE d.customer_id IS NULL    OR s.name <> d.name    OR s.address <> d.address; Step 2: Expi...

collect_list( opposite of explode)

 from pyspark.sql.functions import collect_list from pyspark.sql import SparkSession data1 = [     (1,"apple"),     (1,"banana"),     (1,"cherry"),     (2,"orange"),     (3,"cherry") ] spark = SparkSession.builder.appName("StreamingExample").getOrCreate() df_exploded = spark.createDataFrame(data1, ["id", "fruit"]) df_exploded.show() df_exploded.groupBy("id").agg(collect_list("fruit").alias("fruits")).show()

Demonstrate Explode

 from pyspark.sql import SparkSession from pyspark.sql.functions import explode #spark = SparkSession.builder.appName("StructuredStreaming").getOrCreate() spark = SparkSession.builder.appName("StreamingExample").getOrCreate() data = [     (1, ["apple","banana","cherry"]),     (2, ["orange","cherry"]),     (3, []) ] df = spark.createDataFrame(data, ["id", "fruit"]) df.select("id",explode("fruit").alias("fruits")).show()

3 days running average / capco

 data = [Row(Date='2023-01-01', ProductID=100, QuantitySold=10),         Row(Date='2023-01-02', ProductID=100, QuantitySold=15),         Row(Date='2023-01-03', ProductID=100, QuantitySold=20),         Row(Date='2023-01-04', ProductID=100, QuantitySold=25),         Row(Date='2023-01-05', ProductID=100, QuantitySold=30),         Row(Date='2023-01-06', ProductID=100, QuantitySold=35),         Row(Date='2023-01-07', ProductID=100, QuantitySold=40),         Row(Date='2023-01-08', ProductID=100, QuantitySold=45)] ########################################################################### I have the following sales data for a product, with the quantity sold on each date: Date ProductID QuantitySold 2023-01-01 100 10 2023-01-02 100 15 2023-01-03 100 20 2023-01-04 100 25 2023-01-05 100 30 2023-01-06 100 35...

running total

  🎯 Goal: Calculate a running total of a column (like sales_amount ) using: ✅ Self Join ✅ Window Function (modern/easier way) Let's assume a table: daily_sales sale_date sales_amount 2025-06-01 100 2025-06-02 120 2025-06-03 130 2025-06-04 110 🔁 1. Running Total Using Self Join (Old School SQL) sql Copy Edit SELECT a.sale_date, a.sales_amount, SUM (b.sales_amount) AS running_total FROM daily_sales a JOIN daily_sales b ON b.sale_date <= a.sale_date GROUP BY a.sale_date, a.sales_amount ORDER BY a.sale_date; 🔍 How it works: For each row a , join all rows b that occurred on or before a.sale_date Then sum all b.sales_amount ⚡ 2. Running Total Using Window Function (Modern SQL) sql Copy Edit SELECT sale_date, sales_amount, SUM (sales_amount) OVER ( ORDER BY sale_date ) AS running_total FROM daily_sales; ✅ Simpler, faster, and more readable. Works in: PostgreSQL, SQL Server, Oracle, MySQL 8+, etc. 📊 Sample O...

SQL Question: 7-Day Moving Average

  📄 Table: daily_sales sale_date sales_amount 2025-06-01 100 2025-06-02 120 2025-06-03 130 2025-06-04 110 2025-06-05 150 2025-06-06 140 2025-06-07 160 2025-06-08 170 2025-06-09 180 ❓ Question: Write an SQL query to calculate the 7-day moving average of sales_amount for each sale_date . The moving average should include the current day and the 6 previous days. Return: sale_date sales_amount moving_avg_7d ✅ Answer: SELECT sale_date, sales_amount, ROUND( AVG (sales_amount) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 2 ) AS moving_avg_7d FROM daily_sales; 📊 Output: sale_date sales_amount moving_avg_7d 2025-06-01 100 NULL 2025-06-02 120 NULL 2025-06-03 130 NULL 2025-06-04 110 NULL 2025-06-05 150 NULL 2025-06-06 140 NULL 2025-06-07 160 130.00 2025-06-08 170 140.00 2025-06-09 180 148.57

sql question set 1

✅ SQL Question #1: Repeating Rows Based on Value Input Table : A single-column table with values: Value ----- 1 2 3 Expected Output : Each value should be repeated as many times as its value: Value ----- 1 2 2 3 3 3 Question : Write an SQL query to return each number repeated as many times as its value. For example, 1 should appear once, 2 should appear twice, 3 thrice, and so on. ✅ SQL Question #2: Get the Start and End Cities per Journey Input Table ( TravelRoutes ): Id From To 1 Hyderabad Bangalore 1 Bangalore Chennai 1 Chennai Kochi 2 Bangalore Hyderabad 2 Hyderabad Delhi 3 Delhi Chennai Expected Output : Id From To 1 Hyderabad Kochi 2 Bangalore Delhi 3 Delhi Chennai Question : Write an SQL query to find the starting and ending cities of each journey ( Id ). Assume the path is connected and ordered — you need to find the first From city and the last To city for each Id .