Filling rows from calendar table with previous values - sql

I'm new to SQL, coming over from Python and R, and using Spark SQL with Databricks. I'm trying to complete a basic query and would appreciate guidance, especially guidance that explains the underlying concepts of SQL as they relate to my question.
I have a calendar table with complete, consecutive dates, and a data table with date_added, user_id, sales, and price columns. The data table has incomplete dates, since not every user is active on every date. Below are examples of each table.
Calendar Table
date
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-01-05
2020-01-06
Data Table
date_added user_id sales price
2020-01-02 01 1 4.00
2020-01-05 01 3 4.00
2020-01-02 02 1 5.00
2020-01-03 02 1 5.00
2020-01-05 02 2 5.00
2020-01-03 03 2 1.00
2020-01-05 03 5 1.00
I am looking to create a new table, where every calendar table date within a certain range (the active dates) is defined for every user, and null values for all columns except the sales column are filled by the following value in that column. Something along these lines:
date user_id sales price
2020-01-02 01 1 4.00
2020-01-03 01 null 4.00
2020-01-04 01 null 4.00
2020-01-05 01 3 4.00
2020-01-02 02 1 5.00
2020-01-03 02 1 5.00
2020-01-04 02 null 5.00
2020-01-05 02 2 5.00
2020-01-02 03 null 1.00
2020-01-03 03 2 1.00
2020-01-04 03 null 1.00
2020-01-05 03 5 1.00
Any guidance is appreciated on how I might proceed to this output. I've tried to use a LEFT JOIN on the dates, but without success. I know that the UNION operator is used to concatenate tables on top of one another, but don't know how I would apply that method here.

You can use cross join the users with the calendar table then left join with data table:
spark.sql("""
SELECT date, dates.user_id, sales, COALESCE(data.price, dates.price) AS price
FROM (
SELECT user_id, price, date
FROM (SELECT user_id, FIRST(price) as price FROM data_table GROUP BY user_id)
CROSS JOIN calender_table
WHERE date >= (SELECT MIN(date_added) FROM data_table)
AND date <= (SELECT MAX(date_added) FROM data_table)
) dates
LEFT JOIN data_table data
ON dates.user_id = data.user_id
AND dates.date = data.date_added
""").show()
Output:
+----------+-------+-----+-----+
|date |user_id|sales|price|
+----------+-------+-----+-----+
|2020-01-02|01 |1 |4.0 |
|2020-01-03|01 |null |4.0 |
|2020-01-04|01 |null |4.0 |
|2020-01-05|01 |3 |4.0 |
|2020-01-02|02 |1 |5.0 |
|2020-01-03|02 |1 |5.0 |
|2020-01-04|02 |null |5.0 |
|2020-01-05|02 |2 |5.0 |
|2020-01-02|03 |null |1.0 |
|2020-01-03|03 |2 |1.0 |
|2020-01-04|03 |null |1.0 |
|2020-01-05|03 |5 |1.0 |
+----------+-------+-----+-----+
You can also generate the dates without using a calendar table using sequence function. See my other answer here.

Let your original dataframe as df1. Then you can get the min, max date for each id and let it as `df2'.
from pyspark.sql import functions as f
from pyspark.sql import Window
w = Window.partitionBy('user_id').orderBy(f.desc('date_added'))
df2 = df1.groupBy('user_id') \
.agg(f.sequence(f.min('date_added'), f.max('date_added')).alias('date_added')) \
.withColumn('date_added', f.explode('date_added'))
df2.join(df, ['user_id', 'date_added'], 'left') \
.withColumn('price', f.first('price').over(w)) \
.orderBy('user_id', 'date_added') \
.show()
+-------+----------+-----+-----+
|user_id|date_added|sales|price|
+-------+----------+-----+-----+
| 1|2020-01-02| 1| 4.0|
| 1|2020-01-03| null| 4.0|
| 1|2020-01-04| null| 4.0|
| 1|2020-01-05| 3| 4.0|
| 2|2020-01-02| 1| 5.0|
| 2|2020-01-03| 1| 5.0|
| 2|2020-01-04| null| 5.0|
| 2|2020-01-05| 2| 5.0|
| 3|2020-01-03| 2| 1.0|
| 3|2020-01-04| null| 1.0|
| 3|2020-01-05| 5| 1.0|
+-------+----------+-----+-----+

Related

insert extra rows in query result sql

Given a table with entries at irregular time stamps, "breaks" must be inserted at regular 5 min intervals ( the data associated can / will be NULL ).
I was thinking of getting the start time, making a subquery that has a window function and adds 5 min intervals to the start time - but I only could think of using row_number to increment the values.
WITH data as(
select id, data,
cast(date_and_time as double) * 1000 as time_milliseconds
from t1), -- original data
start_times as(
select id, MIN(CAST(date_and_time as double) * 1000) as start_time
from t1
GROUP BY id
), -- first timestamp for each id
boundries as (
SELECT T1.id,(row_number() OVER (PARTITION BY T1.id ORDER BY T1.date_and_time)-1) *300000 + start_times.start_time
as boundry
from T1
INNER JOIN start_times ON start_times.id= T1.id
) -- increment the number of 5 min added on each row and later full join boundries table with original data
However this limits me to the number of rows present for an id in the original data table, and if the timestamps are spread out, the number of rows cannot cover the amount of 5 min intervals needed to be added.
sample data:
initial data:
|-----------|------------------|------------------|
| id | value | timestamp |
|-----------|------------------|------------------|
| 1 | 3 | 12:00:01.011 |
|-----------|------------------|------------------|
| 1 | 4 | 12:03:30.041 |
|-----------|------------------|------------------|
| 1 | 5 | 12:12:20.231 |
|-----------|------------------|------------------|
| 1 | 3 | 15:00:00.312 |
data after my query:
|-----------|------------------|------------------|
| id | value | timestamp (UNIX) |
|-----------|------------------|------------------|
| 1 | 3 | 12:00:01 |
|-----------|------------------|------------------|
| 1 | 4 | 12:03:30 |
|-----------|------------------|------------------|
| 1 | NULL | 12:05:01 | <-- Data from "boundries"
|-----------|------------------|------------------|
| 1 | NULL | 12:10:01 | <-- Data from "boundries"
|-----------|------------------|------------------|
| 1 | 5 | 12:12:20 |
|-----------|------------------|------------------|
| 1 | NULL | 12:15:01 | <-- Data from "boundries"
|-----------|------------------|------------------|
| 1 | NULL | 12:20:01 | <-- Data from "boundries"
|-----------|------------------|------------------| <-- Jumping directly to 15:00:00 (WRONG! :( need to insert more 5 min breaks here )
| 1 | 3 | 15:00:00 |
I was thinking of creating a temporary table inside HIVE and filling it with x rows representing 5 min intervals from the starttime to the endtime of the data table, but I couldn't find any way of accomplishing that.
Any way of using "for loops" ? Any suggestions would be appreciated.
Thanks
You can try calculating the difference between current timestamp and next one, divide 300 to get number of ranges, produce a string of spaces with length = num_ranges, explode to generate rows.
Demo:
with your_table as (--initial data example
select stack (3,
1,3 ,'2020-01-01 12:00:01.011',
1,4 ,'2020-01-01 12:03:30.041',
1,5 ,'2020-01-01 12:20:20.231'
) as (id ,value ,ts )
)
select id ,value, ts, next_ts,
diff_sec,num_intervals,
from_unixtime(unix_timestamp(ts)+h.i*300) new_ts, coalesce(from_unixtime(unix_timestamp(ts)+h.i*300),ts) as calculated_timestamp
from
(
select id ,value ,ts, next_ts, (unix_timestamp(next_ts)-unix_timestamp(ts)) diff_sec,
floor((unix_timestamp(next_ts)-unix_timestamp(ts))/300 --diff in seconds/5 min
) num_intervals
from
(
select id ,value ,ts, lead(ts) over(order by ts) next_ts
from your_table
) s
)s
lateral view outer posexplode(split(space(cast(s.num_intervals as int)),' ')) h as i,x --this will generate rows
Result:
id value ts next_ts diff_sec num_intervals new_ts calculated_timestamp
1 3 2020-01-01 12:00:01.011 2020-01-01 12:03:30.041 209 0 2020-01-01 12:00:01 2020-01-01 12:00:01
1 4 2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010 3 2020-01-01 12:03:30 2020-01-01 12:03:30
1 4 2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010 3 2020-01-01 12:08:30 2020-01-01 12:08:30
1 4 2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010 3 2020-01-01 12:13:30 2020-01-01 12:13:30
1 4 2020-01-01 12:03:30.041 2020-01-01 12:20:20.231 1010 3 2020-01-01 12:18:30 2020-01-01 12:18:30
1 5 2020-01-01 12:20:20.231 \N \N \N \N 2020-01-01 12:20:20.231
Additional rows were added. I left all intermediate columns for debugging purposes.
A recursive query could be helpful here but hive does not support these more info.
You may consider creating the table outside of hive or writing a UDF.
Either way this query can be expensive and the use of materialized views/tables are recommended depending on your frequency.
The example shows a UDF inbetween created using pyspark to run the query. It
generate the values in between the min and max timestamp from the dataset
using CTEs and the UDF to create a temporary table intervals
generating all possible intervals using an expensive cross join in possible_records
Using a left join to retrieve the records with actual values (for demonstration purposes i've represented the timestamp value as just the time string)
The code below shows how it was evaluated using hive
Example Code
from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType,ArrayType
inbetween = lambda min_value,max_value : [*range(min_value,max_value,5*60)]
udf_inbetween = udf(inbetween,ArrayType(IntegerType()))
sqlContext.udf.register("inbetween",udf_inbetween)
sqlContext.sql("""
WITH max_timestamp(t) as (
select max(timestamp) as t from initial_data2
),
min_timestamp(t) as (
select min(timestamp) as t from initial_data2
),
intervals as (
select explode(inbetween(unix_timestamp(mint.t),unix_timestamp(maxt.t))) as interval_time FROM
min_timestamp mint, max_timestamp maxt
),
unique_ids as (
select distinct id from initial_data2
),
interval_times as (
select interval_time from (
select
cast(from_unixtime(interval_time) as timestamp) as interval_time
from
intervals
UNION
select distinct d.timestamp as interval_time from initial_data2 d
)
order by interval_time asc
),
possible_records as (
select
distinct
d.id,
i.interval_time
FROM
interval_times i, unique_ids d
)
select
p.id,
d.value,
split(cast(p.interval_time as string)," ")[1] as timestamp
FROM
possible_records p
LEFT JOIN
initial_data2 d ON d.id = p.id and d.timestamp = p.interval_time
ORDER BY p.id, p.interval_time
""").show(20)
Output
+---+-----+---------+
| id|value|timestamp|
+---+-----+---------+
| 1| 3| 12:00:01|
| 1| 4| 12:03:30|
| 1| null| 12:05:01|
| 1| null| 12:10:01|
| 1| 5| 12:12:20|
| 1| null| 12:15:01|
| 1| null| 12:20:01|
| 1| null| 12:25:01|
| 1| null| 12:30:01|
| 1| null| 12:35:01|
| 1| null| 12:40:01|
| 1| null| 12:45:01|
| 1| null| 12:50:01|
| 1| null| 12:55:01|
| 1| null| 13:00:01|
| 1| null| 13:05:01|
| 1| null| 13:10:01|
| 1| null| 13:15:01|
| 1| null| 13:20:01|
| 1| null| 13:25:01|
+---+-----+---------+
only showing top 20 rows
Data Prep to replicate
raw_data1 = [
{"id":1,"value":3,"timestam":"12:00:01"},
{"id":1,"value":4,"timestam":"12:03:30"},
{"id":1,"value":5,"timestam":"12:12:20"},
{"id":1,"value":3,"timestam":"15:00:00"},
]
raw_data = [*map(lambda entry : Row(**entry),raw_data1)]
initial_data = sqlContext.createDataFrame(raw_data,schema="id int, value int, timestam string ")
initial_data.createOrReplaceTempView('initial_data')
sqlContext.sql("create or replace temp view initial_data2 as select id,value,cast(timestam as timestamp) as timestamp from initial_data")

Using pyspark to create a segment array from a flat record

I have a sparsely populated table with values for various segments for unique user ids. I need to create an array with unique_id and relevant segment headers only
Please note that this is just an indicative dataset. I have several hundreds of segments like these.
------------------------------------------------
| user_id | seg1 | seg2 | seg3 | seg4 | seg5 |
------------------------------------------------
| 100 | M | null| 25 | null| 30 |
| 200 | null| null| 43 | null| 250 |
| 300 | F | 3000| null| 74 | null|
------------------------------------------------
I am expecting the output to be
-------------------------------
| user_id| segment_array |
-------------------------------
| 100 | [seg1, seg3, seg5] |
| 200 | [seg3, seg5] |
| 300 | [seg1, seg2, seg4] |
-------------------------------
Is there any function available in pyspark of pyspark-sql to accomplish this?
Thanks for your help!
I cannot find the direct way but you can do this.
cols= df.columns[1:]
r = df.withColumn('array', array(*[when(col(c).isNotNull(), lit(c)).otherwise('notmatch') for c in cols])) \
.withColumn('array', array_remove('array', 'notmatch'))
r.show()
+-------+----+----+----+----+----+------------------+
|user_id|seg1|seg2|seg3|seg4|seg5| array|
+-------+----+----+----+----+----+------------------+
| 100| M|null| 25|null| 30|[seg1, seg3, seg5]|
| 200|null|null| 43|null| 250| [seg3, seg5]|
| 300| F|3000|null| 74|null|[seg1, seg2, seg4]|
+-------+----+----+----+----+----+------------------+
Not sure this is the best way but I'd attack it this way:
There's the collect_set function which will always give you a unique value across a list of values you aggregate over.
do a union for each segment on:
df_seg_1 = df.select(
'user_id',
fn.when(
col('seg1').isNotNull(),
lit('seg1)
).alias('segment')
)
# repeat for all segments
df = df_seg_1.union(df_seg_2).union(...)
df.groupBy('user_id').agg(collect_list('segment'))

SQL finding overlapping dates given start and end date

Given a data set in MS SQL Server 2012 where travelers take trips (with trip_ID as UID) and where each trip has start_date and an end_date, I'm looking to find the trip_ID's for each traveler where trip's overlap and the range of that overlap. So if the initial table looks like this:
| trip_ID | traveler | start_date | end_date | trip_length |
|---------|----------|------------|------------|-------------|
| AB24 | Alpha | 2017-01-29 | 2017-01-31 | 2|
| BA02 | Alpha | 2017-01-31 | 2017-02-10 | 10|
| CB82 | Charlie | 2017-02-20 | 2017-02-23 | 3|
| CA29 | Bravo | 2017-02-26 | 2017-02-28 | 2|
| AB14 | Charlie | 2017-03-06 | 2017-03-08 | 2|
| DA45 | Bravo | 2017-03-26 | 2017-03-29 | 3|
| BA22 | Bravo | 2017-03-29 | 2017-04-03 | 5|
I'm looking for a query that will append three columns to the original table: overlap_id, overlap_start, overlap_end. The idea is that each row will have a value (or NULL) for an overlapping trip along with the start and end dates for overlap itself. Like this:
| trip_ID | traveler | start_date | end_date |trip_length|overlap_id |overlap_start| overlap_end|
|---------|----------|------------|------------|-----------|------------|-------------|------------|
| AB24 | Alpha | 2017-01-29 | 2017-01-31 | 2|BA02--------|2017-01-31---|2017-01-31--|
| BA02 | Alpha | 2017-01-31 | 2017-02-10 | 10|AB24--------|2017-01-31---|2017-01-31--|
| CB82 | Charlie | 2017-02-20 | 2017-02-23 | 3|NULL--------|NULL---------|NULL--------|
| CA29 | Bravo | 2017-02-26 | 2017-02-28 | 2|NULL--------|NULL---------|NULL--------|
| AB14 | Charlie | 2017-03-06 | 2017-03-08 | 2|NULL--------|NULL---------|NULL--------|
| DA45 | Bravo | 2017-03-26 | 2017-03-29 | 3|BA22--------|2017-03-28---|2017-03-29--|
| BA22 | Bravo | 2017-03-28 | 2017-04-03 | 5|DA45--------|2017-03-28---|2017-03-29--|
I've tried variations of Overlapping Dates in SQL to inform my approach but it's not returning the right answers. I'm only looking for overlaps for the same traveler (i.e., within Alpha or Bravo, not between Alpha and Bravo).
For the overlap_id column, I think the code would have to test if a trip's start_date plus range(0, trip_length) returns a value within the range of dates between start_date and end_date for any other trip where the traveler is the same, then the trip_id is updated to equal the id of the matching trips. If this is the right concept, I'm not sure how to make a variable out of trip_length so I test a range of values for it, i.e., run this for all values of trip_length - x until trip_length - x = 0.
--This might be the bare bones of an answer
update table
set overlap_id = CASE
WHEN ( DATEADD(day, trip_length, start_date) = SELECT (DATEADD(day, trip_length, start_date) from table where traveler = traveler)
You can join the table with itself (the join condition is described here):
SELECT t.*, o.trip_ID, o.start_date, o.end_date
FROM t
LEFT JOIN t AS o ON t.trip_ID <> o.trip_ID -- trip always overlaps itself so exclude it
AND o.traveler = t.traveler -- same traveller
AND t.start_date <= o.end_date -- overlap test
AND t.end_date >= o.start_date

Sql Server Aggregation or Pivot Table Query

I'm trying to write a query that will tell me the number of customers who had a certain number of transactions each week. I don't know where to start with the query, but I'd assume it involves an aggregate or pivot function. I'm working in SqlServer management studio.
Currently the data is looks like where the first column is the customer id and each subsequent column is a week :
|Customer| 1 | 2| 3 |4 |
----------------------
|001 |1 | 0| 2 |2 |
|002 |0 | 2| 1 |0 |
|003 |0 | 4| 1 |1 |
|004 |1 | 0| 0 |1 |
I'd like to see a return like the following:
|Visits |1 | 2| 3 |4 |
----------------------
|0 |2 | 2| 1 |0 |
|1 |2 | 0| 2 |2 |
|2 |0 | 1| 1 |1 |
|4 |0 | 1| 0 |0 |
What I want is to get the count of customer transactions per week. E.g. during the 1st week 2 customers (i.e. 002 and 003) had 0 transactions, 2 customers (i.e. 001 and 004) had 1 transaction, whereas zero customers had more than 1 transaction
The query below will get you the result you want, but note that it has the column names hard coded. It's easy to add more week columns, but if the number of columns is unknown then you might want to look into a solution using dynamic SQL (which would require accessing the information schema to get the column names). It's not that hard to turn it into a fully dynamic version though.
select
Visits
, coalesce([1],0) as Week1
, coalesce([2],0) as Week2
, coalesce([3],0) as Week3
, coalesce([4],0) as Week4
from (
select *, count(*) c from (
select '1' W, week1 Visits from t union all
select '2' W, week2 Visits from t union all
select '3' W, week3 Visits from t union all
select '4' W, week4 Visits from t ) a
group by W, Visits
) x pivot ( max (c) for W in ([1], [2], [3], [4]) ) as pvt;
In the query your table is called t and the output is:
Visits Week1 Week2 Week3 Week4
0 2 2 1 1
1 2 0 2 2
2 0 1 1 1
4 0 1 0 0

SQL Query Microsoft Access - change horizontal field to vertical field

I need help with access SQL Query.
I create view in access using 4 table, my problem show when I want to change some field into vertical. I know if two matrix but if more than it I can't.
This is my looks like before change
|DataKioskID | KioskName | YearFiscal | MonthReport | ProductID | ProductName | Sales | Stock |
|AB0101061501| Sarana Tani | 2015 | 6 | P15 | Advanta | 56| 12|
|AB0101061501| Sarana Tani | 2015 | 6 | P16 | Advanta | 23| 15|
|AB0101061501| Sarana Tani | 2015 | 6 | P02 | Advanta | 14| 12|
|AB0102061501| TaniLestari | 2015 | 6 | P02 | Advanta | 15| 14|
|AB0102061501| TaniLestari | 2015 | 6 | P15 | Advanta | 12| 15|
|AB0102061501| TaniLestari | 2015 | 6 | P16 | Advanta | 14| 23|
code :
SELECT Data_Kiosk_Header.DataKioskID, Master_Kiosk.KioskName, Data_Kiosk_Header.YearFiscal
, Max(Data_Kiosk_Header.MonthReport) AS monthReport
, Max(IIf(Data_Kiosk_Detail.ProductID='P15',Data_Kiosk_Detail.Sales,0)) AS Advanta_Sales
, Max(IIf(Data_Kiosk_Detail.ProductID='P16',Data_Kiosk_Detail.Sales,0)) AS Agro_Sales
, Max(IIf(Data_Kiosk_Detail.ProductID='P02',Data_Kiosk_Detail.Sales,0)) AS P12_Sales
, Max(IIf(Data_Kiosk_Detail.ProductID='P15',Data_Kiosk_Detail.Stocks,0)) AS Advanta_Stocks
, Max(IIf(Data_Kiosk_Detail.ProductID='P16',Data_Kiosk_Detail.Stocks,0)) AS Agro_Stocks
, Max(IIf(Data_Kiosk_Detail.ProductID='P02',Data_Kiosk_Detail.Stocks,0)) AS P12_Stocks
FROM Master_Kiosk
INNER JOIN (Data_Kiosk_Header INNER JOIN (Data_Kiosk_Detail
INNER JOIN Master_Product ON Data_Kiosk_Detail.ProductID = Master_Product.ProductID) ON Data_Kiosk_Header.DataKioskID = Data_Kiosk_Detail.DataKioskID) ON Master_Kiosk.kioskid = Data_Kiosk_Header.KioskName
GROUP BY Data_Kiosk_Header.DataKioskID, Master_Kiosk.KioskName, Data_Kiosk_Header.YearFiscal;
after the code become like this :
DataKioskID | KioskName |YearFiscal |monthReport |Advanta_Sales |Agro_Sales |P12_Sales |Advanta_Stocks |Agro_Stocks |P12_Stocks |
AB0101061501| Sarana Tani |2015 |6 |56 |23 |14 |12 |15 |12 |
AB0102061501| Tani Lestari|2015 |6 |12 |14 |15 |15 |23 |14 |
Can anybody help me?,I wanna be like this.
|DataKioskID | KioskName | YearFiscal | MonthReport | Sales | Stock |
| | | | | Advanta | Agro | P12 | Advanta | Agro | P12 |
|AB0101061501| Sarana Tani | 2015 | 6 | 56 | 23| 14| 12 | 15| 12|
|AB0102061501| LestariTani | 2015 | 6 | 15 | 12| 14| 14 | 15| 16|
Here I give the DB to you can try what I mean:
DB Source
Exactly what you want is not possible at least on query level because you have 2 level grouping...report is the answer
Furthermore in order to get the info as a "single" query you need the following
1st a cross tab query for sales
TRANSFORM Max(Data_Kiosk_Detail.Sales) AS MaxOfSales
SELECT Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport AS monthReport
,"Sales" AS Info
FROM Master_Kiosk
INNER JOIN (
Data_Kiosk_Header INNER JOIN (
Data_Kiosk_Detail INNER JOIN Master_Product ON Data_Kiosk_Detail.ProductID = Master_Product.ProductID
) ON Data_Kiosk_Header.DataKioskID = Data_Kiosk_Detail.DataKioskID
) ON Master_Kiosk.kioskid = Data_Kiosk_Header.KioskName
GROUP BY Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport
,"Sales"
PIVOT Data_Kiosk_Detail.ProductID;
2nd a cross tab query for Stocks
TRANSFORM Max(Data_Kiosk_Detail.Stocks) AS MaxOfStocks
SELECT Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport AS monthReport
,"Stocks" AS Info
FROM Master_Kiosk
INNER JOIN (
Data_Kiosk_Header INNER JOIN (
Data_Kiosk_Detail INNER JOIN Master_Product ON Data_Kiosk_Detail.ProductID = Master_Product.ProductID
) ON Data_Kiosk_Header.DataKioskID = Data_Kiosk_Detail.DataKioskID
) ON Master_Kiosk.kioskid = Data_Kiosk_Header.KioskName
GROUP BY Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport
,"Stocks"
PIVOT Data_Kiosk_Detail.ProductID;
Then you join them together with a union query
select * from MaxOfSales
UNION select * from MaxOfStocks;
Then you could use the above query to create a report to show what you need