I need help with access SQL Query.
I create view in access using 4 table, my problem show when I want to change some field into vertical. I know if two matrix but if more than it I can't.
This is my looks like before change
|DataKioskID | KioskName | YearFiscal | MonthReport | ProductID | ProductName | Sales | Stock |
|AB0101061501| Sarana Tani | 2015 | 6 | P15 | Advanta | 56| 12|
|AB0101061501| Sarana Tani | 2015 | 6 | P16 | Advanta | 23| 15|
|AB0101061501| Sarana Tani | 2015 | 6 | P02 | Advanta | 14| 12|
|AB0102061501| TaniLestari | 2015 | 6 | P02 | Advanta | 15| 14|
|AB0102061501| TaniLestari | 2015 | 6 | P15 | Advanta | 12| 15|
|AB0102061501| TaniLestari | 2015 | 6 | P16 | Advanta | 14| 23|
code :
SELECT Data_Kiosk_Header.DataKioskID, Master_Kiosk.KioskName, Data_Kiosk_Header.YearFiscal
, Max(Data_Kiosk_Header.MonthReport) AS monthReport
, Max(IIf(Data_Kiosk_Detail.ProductID='P15',Data_Kiosk_Detail.Sales,0)) AS Advanta_Sales
, Max(IIf(Data_Kiosk_Detail.ProductID='P16',Data_Kiosk_Detail.Sales,0)) AS Agro_Sales
, Max(IIf(Data_Kiosk_Detail.ProductID='P02',Data_Kiosk_Detail.Sales,0)) AS P12_Sales
, Max(IIf(Data_Kiosk_Detail.ProductID='P15',Data_Kiosk_Detail.Stocks,0)) AS Advanta_Stocks
, Max(IIf(Data_Kiosk_Detail.ProductID='P16',Data_Kiosk_Detail.Stocks,0)) AS Agro_Stocks
, Max(IIf(Data_Kiosk_Detail.ProductID='P02',Data_Kiosk_Detail.Stocks,0)) AS P12_Stocks
FROM Master_Kiosk
INNER JOIN (Data_Kiosk_Header INNER JOIN (Data_Kiosk_Detail
INNER JOIN Master_Product ON Data_Kiosk_Detail.ProductID = Master_Product.ProductID) ON Data_Kiosk_Header.DataKioskID = Data_Kiosk_Detail.DataKioskID) ON Master_Kiosk.kioskid = Data_Kiosk_Header.KioskName
GROUP BY Data_Kiosk_Header.DataKioskID, Master_Kiosk.KioskName, Data_Kiosk_Header.YearFiscal;
after the code become like this :
DataKioskID | KioskName |YearFiscal |monthReport |Advanta_Sales |Agro_Sales |P12_Sales |Advanta_Stocks |Agro_Stocks |P12_Stocks |
AB0101061501| Sarana Tani |2015 |6 |56 |23 |14 |12 |15 |12 |
AB0102061501| Tani Lestari|2015 |6 |12 |14 |15 |15 |23 |14 |
Can anybody help me?,I wanna be like this.
|DataKioskID | KioskName | YearFiscal | MonthReport | Sales | Stock |
| | | | | Advanta | Agro | P12 | Advanta | Agro | P12 |
|AB0101061501| Sarana Tani | 2015 | 6 | 56 | 23| 14| 12 | 15| 12|
|AB0102061501| LestariTani | 2015 | 6 | 15 | 12| 14| 14 | 15| 16|
Here I give the DB to you can try what I mean:
DB Source
Exactly what you want is not possible at least on query level because you have 2 level grouping...report is the answer
Furthermore in order to get the info as a "single" query you need the following
1st a cross tab query for sales
TRANSFORM Max(Data_Kiosk_Detail.Sales) AS MaxOfSales
SELECT Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport AS monthReport
,"Sales" AS Info
FROM Master_Kiosk
INNER JOIN (
Data_Kiosk_Header INNER JOIN (
Data_Kiosk_Detail INNER JOIN Master_Product ON Data_Kiosk_Detail.ProductID = Master_Product.ProductID
) ON Data_Kiosk_Header.DataKioskID = Data_Kiosk_Detail.DataKioskID
) ON Master_Kiosk.kioskid = Data_Kiosk_Header.KioskName
GROUP BY Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport
,"Sales"
PIVOT Data_Kiosk_Detail.ProductID;
2nd a cross tab query for Stocks
TRANSFORM Max(Data_Kiosk_Detail.Stocks) AS MaxOfStocks
SELECT Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport AS monthReport
,"Stocks" AS Info
FROM Master_Kiosk
INNER JOIN (
Data_Kiosk_Header INNER JOIN (
Data_Kiosk_Detail INNER JOIN Master_Product ON Data_Kiosk_Detail.ProductID = Master_Product.ProductID
) ON Data_Kiosk_Header.DataKioskID = Data_Kiosk_Detail.DataKioskID
) ON Master_Kiosk.kioskid = Data_Kiosk_Header.KioskName
GROUP BY Data_Kiosk_Header.DataKioskID
,Master_Kiosk.KioskName
,Data_Kiosk_Header.YearFiscal
,Data_Kiosk_Header.MonthReport
,"Stocks"
PIVOT Data_Kiosk_Detail.ProductID;
Then you join them together with a union query
select * from MaxOfSales
UNION select * from MaxOfStocks;
Then you could use the above query to create a report to show what you need
Related
I am looking for a way to concatenate the result of the table into one row.
I have 4 tables;
Suppliers table
+--+----------------+----------------+
|id|name |hook_name |
+--+----------------+----------------+
|1 |724 |724 |
|2 |Air |air |
|3 |Akustik |akustik |
|4 |Almira |almira |
+--+----------------+----------------+
Supplier Offices;
(label column represents pickup/dropoff string)
+---+-----------+----------+------------+
|id |supplier_id| zip_code | label |
+---+-----------+----------------+------+
|95 |24 |25325 | 344 | <- supplier_id 24 has office location 77,98 (label pickup)
|96 |24 |9535 | 93 | <- same. only label different (label dropoff)
|97 |1 |2858 | 95 |
|98 |1 |50285 | 954 |
|99 |1 |10094 | 24 |
|100|1 |4353 | 59 |
+---+-----------+----------------+------+
OfficeLocations (Pivot table)
+------------------+-----------+
|supplier_office_id|location_id|
+------------------+-----------+
|95 |77 | <- location I want to concatenate `supplier_id = 24` (istanbul)
|96 |98 | <- location I want to concatenate `supplier_id = 24` (london)
|97 |77 |
|98 |77 |
+------------------+-----------+
Locations
+---------------+
|id |name |
+---------------+
|77 |istanbul |
|96 |berlin |
|97 |newyork |
|98 |london |
+---------------+
I want to find the office for the given location information.
I haven't manage to create custom column about label.
If I want to access the office information of locations 1 and 2 I want to get something like this;
+---------------+------------+----------------+
| supplier_id | pickup_label | dropoff_label |
+-------------+--------------+----------------+
| 95 | 344 | 93 |
+-------------+--------------+----------------+
I've been able to get this far right now with my Postgresl SQL.
SELECT supplier_offices.id,
supplier_offices.supplier_id
FROM "supplier_offices"
INNER JOIN suppliers on supplier_offices.supplier_id = suppliers.id
INNER JOIN office_locations on supplier_offices.id = office_locations.supplier_office_id
AND ("office_locations"."location_id" IN (77, 98)
This code works if I understand what you mean. This code works if I understand what you mean. Of course for sql, but with a little change, I think it will work in Postgresl SQL as well
select supplier_offices.id as supplier_id, max(supplier_offices.label) as pickup_label, min(supplier_offices.label) as dropoff_label
from supplier_offices
inner join suppliers on supplier_offices.supplier_id = suppliers.id
inner join office_locations on supplier_offices.id = office_locations.supplier_office_id
where office_locations.location_id in (77,98)
group by supplier_id
I think you want some sort of aggregation. It is entirely unclear hoe pickup locations are identified versus drop off. But, something like this:
SELECT so.supplier_id,
ARRAY_AGG(location_id) FILTER (WHERE so.label in (344)) as pickup,
ARRAY_AGG(location_id) FILTER (WHERE so.label not in (344)) as dropoff
FROM "supplier_offices" so JOIN
office_locations ol
ON so.id = ol.supplier_office_id AND
ol.location_id IN (77, 98)
GROUP BY so.supplier_id;
I have a sparsely populated table with values for various segments for unique user ids. I need to create an array with unique_id and relevant segment headers only
Please note that this is just an indicative dataset. I have several hundreds of segments like these.
------------------------------------------------
| user_id | seg1 | seg2 | seg3 | seg4 | seg5 |
------------------------------------------------
| 100 | M | null| 25 | null| 30 |
| 200 | null| null| 43 | null| 250 |
| 300 | F | 3000| null| 74 | null|
------------------------------------------------
I am expecting the output to be
-------------------------------
| user_id| segment_array |
-------------------------------
| 100 | [seg1, seg3, seg5] |
| 200 | [seg3, seg5] |
| 300 | [seg1, seg2, seg4] |
-------------------------------
Is there any function available in pyspark of pyspark-sql to accomplish this?
Thanks for your help!
I cannot find the direct way but you can do this.
cols= df.columns[1:]
r = df.withColumn('array', array(*[when(col(c).isNotNull(), lit(c)).otherwise('notmatch') for c in cols])) \
.withColumn('array', array_remove('array', 'notmatch'))
r.show()
+-------+----+----+----+----+----+------------------+
|user_id|seg1|seg2|seg3|seg4|seg5| array|
+-------+----+----+----+----+----+------------------+
| 100| M|null| 25|null| 30|[seg1, seg3, seg5]|
| 200|null|null| 43|null| 250| [seg3, seg5]|
| 300| F|3000|null| 74|null|[seg1, seg2, seg4]|
+-------+----+----+----+----+----+------------------+
Not sure this is the best way but I'd attack it this way:
There's the collect_set function which will always give you a unique value across a list of values you aggregate over.
do a union for each segment on:
df_seg_1 = df.select(
'user_id',
fn.when(
col('seg1').isNotNull(),
lit('seg1)
).alias('segment')
)
# repeat for all segments
df = df_seg_1.union(df_seg_2).union(...)
df.groupBy('user_id').agg(collect_list('segment'))
I'm hoping that im over thinking this. but i need to sum a column where i have no unique link to join on and when i do it double ups columns.
This is my current SQL that works until i add the join on vwBatchInData then it doubles up every record, what is the best way to achieve this?
select b.fldBatchID as 'ID',SUM(bIn.fldBatchDetailsWeight) as 'Batch In', sum(t.fldTransactionNetWeight) as 'Batch Out' , format((sum(t.fldTransactionNetWeight) / sum(bIn.fldBatchDetailsWeight)),'P2' ) as 'Yield'
from [TRANSACTION] t
right join vwBatchInData bIn on bIn.fldBatchID = t.fldBatchID
inner join Batch b on b.fldBatchID = t.fldBatchID
where CAST(b.fldBatchDate as date) = '2020-03-04'
group by b.fldBatchID**
vwBatchInData Table
+------------+---------------+-----------------------+
| fldBatchID | fldKillNumber | fldBatchDetailsWeight |
+------------+---------------+-----------------------+
| 2862 | 601598 | 164.40 |
| 2862 | 601599 | 190.80 |
| 2862 | 601596 | 195.00 |
| 2862 | 601597 | 200.20 |
| 2862 | 601594 | 176.60 |
+------------+---------------+-----------------------+
Transaction Table
+------------+------------------+-------------------------+
| fldBatchID | fldTransactionID | fldTransactionNetWeight |
+------------+------------------+-------------------------+
| 2862 | 10242352 | 16.26 |
| 2862 | 10242353 | 22.82 |
| 2862 | 10242362 | 18.52 |
| 2862 | 10242363 | 21.44 |
| 2862 | 10242364 | 20.32 |
+------------+------------------+-------------------------+
Batch Table
+------------+-------------------------+
| fldBatchID | fldBatchDate |
+------------+-------------------------+
| 2862 | 2020-03-04 00:00:00.000 |
+------------+-------------------------+
Desired output with the above snipets
+------+----------+-----------+---------+
| ID | Batch In | Batch Out | Yield |
+------+----------+-----------+---------+
| 2862 | 927.00 | 90.36 | 10.76 % |
+------+----------+-----------+---------+
I think you just want to aggregate before joining:
select b.fldBatchID as ID,
(bIn.fldBatchDetailsWeight) as batch_in,
(t.fldTransactionNetWeight) as batch_out,
format(t.fldTransactionNetWeight / bIn.fldBatchDetailsWeight, 'P2' ) as Yield
from batch b left join
(select bin.fldBatchID, sum(fldBatchDetailsWeight) as fldBatchDetailsWeight
from vwBatchInData bin
group by bin.fldBatchID
) bin
on bIn.fldBatchID = b.fldBatchID left join
(select t.fldBatchID, sum(fldTransactionNetWeight) as fldTransactionNetWeight
from transactions t
group by t.fldBatchID
) bin
on t.fldBatchID = b.fldBatchID
where CAST(b.fldBatchDate as date) = '2020-03-04';
I have a date like below :- I have to display year_month column column wise. How should I use this, I am new to spark.
scala> spark.sql("""select sum(actual_calls_count),year_month from ph_com_b_gbl_dice.dm_rep_customer_call group by year_month""")
res0: org.apache.spark.sql.DataFrame = [sum(actual_calls_count): bigint, year_month: string]
scala> res0.show
+-----------------------+----------+
|sum(actual_calls_count)|year_month|
+-----------------------+----------+
| 1| 2019-10|
| 3693| 2018-10|
| 7| 2019-11|
| 32| 2017-10|
| 94| 2019-03|
| 10527| 2018-06|
| 4774| 2017-05|
| 1279| 2017-11|
| 331982| 2018-03|
| 315767| 2018-02|
| 7097| 2017-03|
| 8| 2017-08|
| 3| 2019-07|
| 3136| 2017-06|
| 6088| 2017-02|
| 6344| 2017-04|
| 223426| 2018-05|
| 9819| 2018-08|
| 1| 2017-07|
| 68| 2019-05|
+-----------------------+----------+
only showing top 20 rows
My output should be like this :-
sum(actual_calls_count)|year_month1 | year_month2 | year_month3 and so on..
scala> df.groupBy(lit(1)).pivot(col("year_month")).agg(concat_ws("",collect_list(col("sum")))).drop("1").show(false)
+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
|2017-02|2017-03|2017-04|2017-05|2017-06|2017-07|2017-08|2017-10|2017-11|2018-02|2018-03|2018-05|2018-06|2018-08|2018-10|2019-03|2019-05|2019-07|2019-10|2019-11|
+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
|6088 |7097 |6344 |4774 |3136 |1 |8 |32 |1279 |315767 |331982 |223426 |10527 |9819 |3693 |94 |68 |3 |1 |7 |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
I have a flat table structure which I've turned into a column based table. I'm struggling with getting the rowId from my raw data to appear in my column based table. Any help greatly appreciated.
Raw data in table derived from three different tables:
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 1 |rowId |[NULL] |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
| 2 |rowId |[NULL] |
SQL using pivot to give me the column structure:
select [displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId]
from
(
SELECT [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3],[rowId])
) as P
Result:
|displayOrder|itemDescription|itemName |price1|price2|price3|rowId |
| ---------- |:-------------:|:------------:|:----:|:----:|:----:|-----:|
|1 |Green |Polo Shirt |4.2 |5.3 |7.5 |[NULL]|
|3 |Red watch |Digital Watch |4.0 |2.0 |1.5 |[NULL]|
I understand why I'm getting the NULL value for rowId. What I'm stuck on and I'm not sure if it's possible to do as I've looked an many example and none seem to do this, that is to pull the value for rowId from the raw data and add it to my structure.
It looks obvious now!
I'm now not including rowId as part of my flat structure.
| rowId |columnName |ColumnValue |
| ---------------- |:---------------:| -----------:|
| 1 |itemNo |1 |
| 1 |itemName |Polo Shirt |
| 1 |itemDescription |Green |
| 1 |price1 |4.2 |
| 1 |price2 |5.3 |
| 1 |price3 |7.5 |
| 1 |displayOrder |1 |
| 2 |itemNo |12 |
| 2 |itemName |Digital Watch|
| 2 |itemDescription |Red Watch |
| 2 |price1 |4.0 |
| 2 |price2 |2.0 |
| 2 |price3 |1.5 |
| 2 |displayOrder |3 |
I've updated the SQL, you can see I'm pulling in the rowId from tblFlatColumnValues
select [rowId],[displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3]
from
(
SELECT cv.rowId, [columnName], [columnValue] , row_number() over(partition by c.columnName order by cv.rowId) as rn
FROM tblFlatTable AS t
JOIN tblFlatColumns c
ON t.flatTableId = c.flatTableId
JOIN tblFlatColumnValues cv
ON cv.flatColumnId = c.flatColumnId
WHERE (t.flatTableId = 1) AND (t.isActive = 1)
AND (c.isActive = 1) AND (cv.isActive = 1)
) as S
Pivot
(
MIN([columnValue])
FOR columnName IN ([displayOrder],[itemDescription],[itemName],[itemNo],[price1],[price2],[price3])
) as P