Select Top User over a list of Pages - sql

I have a table containing records of Users' internet history. The table's structure contains the User_ID, the Page Accessed, and the Date Accessed of the page. For Example:
+==========================================+
|User_ID | Page_Accessed | Date_Accessed |
+==========================================+
|Johh.Doe | Google | 1/1/2015 |
|Johh.Doe | Google | 1/1/2015 |
|Suzy.Lue | Google | 7/11/2015 |
|Suzy.Lue | Wikipedia | 4/23/2015 |
|Babe Ruth| StackOverflow | 9/1/2015 |
+==========================================+
I am currently trying to use a SQL query that uses:
RANK() OVER (PARTITION BY [Page Accessed] ORDER BY Count(DateAcc))
Then I use a PIVOT() by the Various Sites. However after selecting the records WHERE (Num = 1) from the PIVOT() and a GROUP BY [Rank], I'm ending up with resulting query similar to:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| NULL | NULL |
| 1 | NULL | Suzy Lue | NULL |
| 1 | NULL | NULL | Babe Ruth |
+=================================================+
Instead I need to reformat my output as:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| Suzy Lue | Babe Ruth |
+=================================================+
My Current Query:
SELECT Rank, Google, Wikipedia, StackOverflow
FROM(
SELECT TOP (100) PERCENT User_ID, Page_Accessed, COUNT(Date_Accessed) AS Views,
RANK() OVER (PARTITION BY Page_Accessed ORDER BY Count(Date_Accessed) DESC) AS Rank
FROM Record_Table
GROUP BY dbo.location_key.subSite, dbo.user_info_list_parse.Name
ORDER BY Views DESC) AS tb
PIVOT (
max(tb.User_ID) FOR
Page_Accessed IN ( Google, Wikipedia, StackOverflow)
) pvt
WHERE (Num = 1)
Are there any creative solutions to obtain this result?

I think you've already found solution but for your information and for others reading this - let me erase noise in this query. There is no need to ORDER BY, no need to apply TOP (100) PERCENT, Views column is redundant. I would simplify this query as follows:
CREATE TABLE InternetHistory
(
[User_ID] varchar(20),
[Page_Accessed] varchar(20),
[Date_Accessed] datetime
)
INSERT InternetHistory VALUES
('Johh.Doe', 'Google', '2015-01-01'),
('Johh.Doe', 'Google', '2015-01-01'),
('Suzy.Lue', 'Google', '2015-07-11'),
('Suzy.Lue', 'Wikipedia', '2015-04-23'),
('Babe Ruth', 'StackOverflow', '2015-01-09')
SELECT * FROM
(
SELECT [User_ID], [Page_Accessed], RANK() OVER (PARTITION BY [Page_Accessed] ORDER BY COUNT(*) DESC) Ranking
FROM InternetHistory
GROUP BY [User_ID], [Page_Accessed]
) AS Src
PIVOT
(
MAX([User_Id]) FOR [Page_Accessed] IN ([Google], [Wikipedia], [StackOverflow])
) AS Pvt
WHERE Ranking = 1

Related

Self join to create a new column with updated records

I am trying to write a SQL query to get the start date for employees in a store. As seen in the first screenshot, employee number 5041 had the number A0EH but as the number got updated, it updated the start date for the employee as well. This effects the metric of total duration in the store.
I am trying to get to the output below but haven't been able to figure out how to get this view.
This is the code I was trying but I am not getting the correct output.
select
esd.employee_number,
(case when esd.old_employee_number is null then es.employee_number else es.old_employee_number end) as old_employee_number,
esd.entity_id,
esd.original_start_date
from earliest_start_date as esd
left join earliest_start_date as es
on (es.employee_number = esd.old_employee_number)
How do I solve this on SQL?
Redshift reportedly supports recursion via WITH clause. Here's an example:
MariaDB 10.5 has similar support. Test case is here:
Fully working test case (via MariaDB 10.5) (Updated)
Link to Amazon Redshift detail for WITH clause and window functions:
Amazon Redshift - WITH clause
Amazon redshift - Window functions
WITH RECURSIVE cte (employee_number, original_no, entity_id, original_start_date, n) AS (
SELECT employee_number, employee_number, entity_id, original_start_date, 1 FROM earliest_start_date WHERE old_employee_number IS NULL UNION ALL
SELECT new_tbl.employee_number, cte.original_no, cte.entity_id, cte.original_start_date, n+1
FROM earliest_start_date new_tbl
JOIN cte
ON cte.employee_number = new_tbl.old_employee_number
)
, xrows AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY n DESC) AS rn
FROM cte
)
SELECT * FROM xrows WHERE rn = 1
;
Result:
+-----------------+-------------+-----------+---------------------+------+----+
| employee_number | original_no | entity_id | original_start_date | n | rn |
+-----------------+-------------+-----------+---------------------+------+----+
| XXXX | XXXX | 88 | 2021-09-02 | 1 | 1 |
| 5041 | A0EH | 96 | 2021-09-05 | 2 | 1 |
+-----------------+-------------+-----------+---------------------+------+----+
2 rows in set
Raw test data:
SELECT * FROM earliest_start_date;
+-----------------+---------------------+-----------+---------------------+
| employee_number | old_employee_number | entity_id | original_start_date |
+-----------------+---------------------+-----------+---------------------+
| 5041 | A0EH | 96 | 2021-09-10 |
| A0EH | NULL | 96 | 2021-09-05 |
| XXXX | NULL | 88 | 2021-09-02 |
+-----------------+---------------------+-----------+---------------------+
Note that the logic makes assumption about uniqueness of the employee_number and, in the current form, can't handle cases where the employee_number is reused by the same employee or used again with a different employee without adjusting prior data. There may not be enough detail in the current structure to handle those cases.

Add a subtotal column for aggregated columns

Here's my dataset of trades, traders and counterparties:
TRADER_ID | TRADER_NAME | EXEC_BROKER | TRADE_AMOUNT | TRADE_ID
ABC123 | Jules Winnfield | GOLD | 10000 | ASDADAD
XDA241 | Jimmie Dimmick | GOLD | 12000 | ASSVASD
ADC123 | Vincent Vega | BARC | 10000 | ZXCZCX
ABC123 | Jules Winnfield | BARC | 15000 | ASSXCQA
ADC123 | Vincent Vega | CRED | 250000 | RFAQQA
ABC123 | Jules Winnfield | CRED | 5000 | ASDQ23A
ABC123 | Jules Winnfield | GOLD | 5000 | AVBDQ3A
I'm looking to produce a repeatable monthly report that gives me a view of trading activity aggregated at the counterparty (the EXEC_BROKER field) level, with subtotals - as shown below:
TRADER_ID | TRADER_NAME | NO._OF_CCP_USED | CCP | TRADED_AMT_WITH_CCP | VALUE_OF_TOTAL_TRADES | TRADES_WITH_CCP | TOTAL_TRADES
ABC123 | Jules Winnfield | 3 | GOLD | 15000 | 35000 | 2 | 4
ABC123 | Jules Winnfield | 3 | BARC | 15000 | 35000 | 1 | 4
ABC123 | Jules Winnfield | 3 | CRED | 5000 | 35000 | 1 | 4
...and so on the rest.
The idea is to aggregate the number of trades per counterparty (which I have done using a count function), and the sum of traded amounts with the ccp, but I'm struggling to get the 'subtotal' field next to each trader as shown in my desired output above - so you can see here that Jules has dealt with 3 counterparties in total, with 4 trades between them, and a collective amount of 35000.
I have tried using a combination of aggregate and over by functions, but to no avail.
SELECT
OT.TRADER_ID,
OT.TRADER_NAME,
OT.EXEC_BROKER,
SUM(OT.TRADE_AMOUNT) AS VALUE_OF_TOTAL_TRADES,
COUNT(OT.TRADE_ID) AS TOTAL_TRADES,
COUNT(OT.EXEC_BROKER) OVER PARTITION BY (OT.TRADER_ID) AS NO._OF_CCP_USED,
SUM(OT.TRADE_AMOUNT) OVER PARTITION BY (OT.EXEC_BROKER) AS TRADED_AMT_WITH_CCP,
COUNT(OT.TRADE_ID) OVER PARTITION BY (OT.EXEC_BROKER) AS TRADES_WITH_CCP
FROM dbo.ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.EXEC_BROKER, OT.TRADE_AMOUNT, OT.TRADE_ID
The code above runs but returns millions of rows. When I remove the partition by lines, I get the desired result minus the subtotal columns I'm looking for.
Any suggestions please? Thanks very much!
EDIT:
Final code which gave me the desired output: updating my question to provide this response (thanks to Gordon Linoff) so that others can benefit:
SELECT
OT.TRADER_ID,
OT.TRADER_NAME,
OT.EXEC_BROKER,
RANK() OVER (PARTITION BY OT.TRADER_ID ORDER BY
SUM(OT.TRADE_AMOUNT) DESC) AS CCP_RANK,
SUM(OT.TRADE_AMOUNT) AS TRADED_AMT_WITH_CCP,
SUM(SUM(OT.TRADE_AMOUNT)) OVER (PARTITION BY OT.TRADER_ID) AS
VALUE_OF_TOTAL_TRADES,
COUNT(*) OVER (PARTITION BY OT.TRADER_ID) AS NUM_OF_CCP_USED,
SUM(COUNT(OT.TRADE_ID)) OVER (PARTITION BY OT.TRADER_ID) AS
TOTAL_TRADES
FROM dbo.ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.EXEC_BROKER
You seem to want:
SELECT OT.TRADER_ID, OT.TRADER_NAME, OT.CCP,
COUNT(*) OVER (PARTITION BY OT.TRADER_ID) as NUM_CCP,
SUM(OT.TRADED_AMT) AS TRADED_AMT_WITH_CCP,
SUM(SUM(OT.TRADED_AMT)) OVER (PARTITION BY OT.TRADER_ID) AS VALUE_OF_TOTAL_TRADES,
COUNT(OT.TRADE_ID) AS CCP_TRADES,
SUM(COUNT(OT.TRADE_ID)) OVER (PARTITION BY OT.TRADER_ID) AS TOTAL_TRADES
FROM ORDERS_TRADES OT
GROUP BY OT.TRADER_ID, OT.TRADER_NAME, OT.CCP;
I'm not sure what your query has to do with the results you want. The columns have little to do with what you are asking.
Here is a db<>fiddle.
Making some assumptions about the nomenclature, here is a solution that doesn't use anything too fancy so it's easy to maintain, though it's not the most efficient:
create table trades
(
TRADER_ID varchar(10),
TRADER_NAME varchar(20),
CCP char(4),
TRADED_AMT decimal(10,2),
TRADE_ID varchar(10) primary key
);
insert trades
values
('ABC123', 'Jules Winnfield', 'GOLD', 10000 , 'ASDADAD'),
('XDA241', 'Jimmie Dimmick ', 'GOLD', 12000 , 'ASSVASD'),
('ADC123', 'Vincent Vega ', 'BARC', 10000 , 'ZXCZCX'),
('ABC123', 'Jules Winnfield', 'BARC', 15000 , 'ASSXCQA'),
('ADC123', 'Vincent Vega ', 'CRED', 250000, 'RFAQQA'),
('ABC123', 'Jules Winnfield', 'CRED', 5000 , 'ASDQ23A'),
('ABC123', 'Jules Winnfield', 'GOLD', 5000 , 'AVBDQ3A');
with trader_totals as
(
select trader_id,
distinct_ccps = count(distinct CCP),
total_amt = sum(traded_amt),
total_count = count(*)
from trades
group by trader_id
)
select trader_id = tr.trader_id,
trader_name = trader_name,
distinct_CCP_count = tt.distinct_ccps,
CCP = tr.CCP,
this_CCP_traded_amt = sum(traded_amt),
total_traded_amt = tt.total_amt,
this_CCP_traded_count = count(*),
total_traded_count = tt.total_count
from trades tr
join trader_totals tt on tt.trader_id = tr.trader_id
group by tr.trader_id,
tr.trader_name,
tr.CCP,
tt.distinct_ccps,
tt.total_amt,
tt.total_count

SQL Server stored procedure inserting duplicate rows

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data
Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().
In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).
So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);

More efficient way to query shortest string value associated with each value in another column in Hive QL

I have a table in Hive containing store names, order IDs, and User IDs (as well as some other columns including item ID). There is a row in the table for every item purchased (so there can be more than one row per order if the order contains multiple items). Order IDs are unique within a store, but not across stores. A single order can have more than one user ID associated with it.
I'm trying to write a query that will return a list of all stores and order IDs and the shortest user ID associated with each order.
So, for example, if the data looks like this:
STORE | ORDERID | USERID | ITEMID
------+---------+--------+-------
| a | 1 | bill | abc |
| a | 1 | susan | def |
| a | 2 | jane | abc |
| b | 1 | scott | ghi |
| b | 1 | tony | jkl |
Then the output would look like this:
STORE | ORDERID | USERID
------+---------+-------
a | 1 | bill
a | 2 | jane
b | 1 | tony
I've written a query that will do this, but I feel like there must be a more efficient way to go about it. Does anybody know a better way to produce these results?
This is what I have so far:
select
users.store, users.orderid, users.userid
from
(select
store, orderid, userid, length(userid) as len
from
sales) users
join
(select distinct
store, orderid,
min(length(userid)) over (partition by store, orderid) as len
from
sales) len on users.store = len.store
and users.orderid = len.orderid
and users.len = len.len
Check out probably this will work for you, here you can achieve your goal of single "SELECT" clause with no extra overhead on SQL.
select distinct
store, orderid,
first_value(userid) over(partition by store, orderid order by length(userid) asc) f_val
from
sales;
The result will be:
store orderid f_val
a 1 bill
a 2 jane
b 1 tony
Probably rank() is the best way:
select s.*
from (select s.*, rank() over (partition by store order by length(userid) as seqnum
from sales s
) s
where seqnum = 1;

SQL Server Find the date in joining order

I am using MS-SQL Server there are two tables
membership
+---+-----------------+---------------------+----------------
| | membershipName | createddate | price |
+---+-----------------+---------------------+----------------
| 1 | Swimming | 2010-01-01 | 30 |
| 2 | Swimming | 2010-05-01 | 32 |
| 3 | Swimming | 2011-01-01 | 35 |
| 4 | Swimming | 2012-01-01 | 40 |
+---+-----------------+---------------------+----------------
member
+---+-----------------+---------------------+-----------------
| | memberName | membership | joiningDate |
+---+-----------------+---------------------+-----------------
| 0 | Andy | Swimming | 2008-02-02 |
| 1 | John | Swimming | 2010-02-02 |
| 2 | Andy | Swimming | 2011-02-02 |
| 3 | Alice | Swimming | 2015-02-02 |
+---+-----------------+---------------------+----------------
I want find the member's membership price for the right period of time
e.g
Andy return NULL
John return 30
Alice return 40
the best logic is to see
if the joiningDate is in between two start date
if yes choose the earlier date
if not
if the joining date is before the earlier date then use the earliest date
if the joining date is after the latest date then use the latest date
I am a Java programmer, do this in sql is quite tricky for me, any hint would be nice!
edit 1: sorry I forgot to consider month
edit 2: added desirable result
I hope I understood you correctly. try this out:
SELECT TOP 1 ms.Price
FROM membership ms
LEFT JOIN member m
ON m.joiningdate > ms.createdate
WHERE m.id = 3
ORDER BY price DESC
I hope I got this correctly. You might try it like this:
Declared table variable to mock-up a test scenario:
DECLARE #membership TABLE(id INT, membershipName VARCHAR(100),createddate DATETIME,price DECIMAL(10,4));
INSERT INTO #membership VALUES
(1,'Swimming',{d'2010-01-01'},30)
,(2,'Swimming',{d'2010-05-01'},32)
,(3,'Swimming',{d'2011-01-01'},35)
,(4,'Swimming',{d'2012-01-01'},40);
DECLARE #member TABLE(id INT,memberName VARCHAR(100),membership VARCHAR(100),joiningDate DATETIME);
INSERT INTO #member VALUES
(0,'Andy','Swimming',{d'2008-02-02'})
,(1,'John','Swimming',{d'2010-02-02'})
,(2,'Andy','Swimming',{d'2011-02-02'})
,(3,'Alice','Swimming',{d'2015-02-02'});
As you are on SQL-Server 2012 you are lucky. You can use LEAD:
The CTE "Intervalls" will return the membership table as is and it will add one column with one second before the next rows createddate. LEAD helps you to get hands on a value of a later coming row. First I take away one second, then I set a very high date in case of NULL:
WITH Intervalls AS
(
SELECT *
,ISNULL(DATEADD(SECOND ,-1,LEAD(createddate) OVER(ORDER BY createddate)),{d'2100-01-01'}) AS EndOfIntervall
FROM #membership AS ms
)
--The SELECT reads all members and joins them to the membership where their date is in the range according to "Intervalls". Only the case ealier than the first must be treated specially:
SELECT m.*
,ISNULL(i.price, CASE WHEN YEAR(m.joiningDate)<(SELECT MIN(x.createddate) FROM #membership as x)
THEN (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC) END)
FROM #member AS m
LEFT JOIN Intervalls AS i ON m.joiningDate BETWEEN i.createddate AND i.EndOfIntervall
UPDATE Better approach (thx to Paparis)
SELECT m.*
,ISNULL(Corresponding.price, (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC)) AS price
FROM #member AS m
OUTER APPLY
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
) AS Corresponding
UPDATE 2: Even simpler!
SELECT m.*
,ISNULL
(
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
),
(
SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC
)
) AS price
FROM #member AS m