how to select entries by comparing two lists? - sql

I need to select all entries where any item of a list is in some column
here is MY_TABLE:
COLUMN1 COLUMN2
1 item1, item2, item3, item4
2 item5, item6, item7, item8
3 item9, item10, item11, item12
4 item13, item14, item15, item16
5 item17, item18, item19, item20
So I need something like:
DECLARE #MY_LIST
SET MY_LIST = "'item1', 'item15'"
SELECT * FROM MY_TABLE WHERE #MY_LIST IN COLUMN2
and it should return:
COLUMN1 COLUMN2
1 item1, item2, item3, item4
4 item13, item14, item15, item16
MY_LIST can be an array and data in COLUMN2 we can convert into array as well.
So if there is an item in both MY_LIST and in COLUMN2 then I need this entry be selected.
Thank you very much for any response

Related

SQL group-by largest

I have a table that has cooccurrence counts by objects that looks like the following:
col1 col2 count
item1 item2 3
item3 item2 1
item1 item4 1
item2 item3 2
I would like to do a group-by largest top n counts on col1, however if I do that on the above table since all object pairs are not available the result would be the following: (which is not correct)
col1 col2 count
item1 item2 3
item3 item2 1
item2 item3 2
If I swap the columns and then add them back to the same table this would be the result:
col1 col2 count
item1 item2 3
item3 item2 1
item1 item4 1
item2 item3 2
item2 item1 3
item2 item3 1
item4 item1 1
item3 item2 2
And the group by would yield: (the correct result)
col1 col2 count
item1 item2 3
item2 item1 3
item4 item1 1
item3 item2 2
What would the proper query be for producing this kind of group-by? Am I correct that the column would need to be swapped and concatenated or is there a better way to go about this? (I'm using Postgres)
In the above I am showing a group by top 1, for the sake of keeping the example simple, in reality this is a group by top 10
This answers the original version of the question.
I think you want:
select v.col1, v.col2, max(t.count)
from t cross join lateral
(values (col1, col2), (col2, col1)) v(col1, col2)
group by v.col1, v.col2;

How to get total record count for records that match 3 different criteria in single query

I have a table similar to this
Item1 Item2
yes yes
yes no
yes yes
yes yes
etc., etc.
I need to get the count of the records that have both Item1 & Item2. And also the counts for records that just have Item1 or Item2 and not have duplicate records in the final query. Any suggestions will be greatly appreciated as always.
Perhaps you just want group by:
select item1, item2, count(*)
from t
group by item1, item2;
If you specifically want to combine values, you could do:
select sum(case when item1 = 'yes' and item2 = 'yes' then 1 else 0 end) as two_yesses,
sum(case when (item1 = 'yes' or item2 = 'yes') and item1 <> item2
then 1 else 0
end) as one_yes
from t;

How do use SQL scripts in R

I need to write a SQL query
Here are my tables
x <- read.csv("C:/Users/Admin/Downloads/Set 1-1.csv",sep=",",dec=".")
y <- read.csv("C:/Users/Admin/Downloads/Set 1-2 - Copy.csv",sep=",",dec=".")
y$score <- 1
I tried joining it
library("sqldf")
select clientid,emailmessageid,null cnttrn,idatediff,null score from x
union all select clientid,emailmessageid,cnttrn,idatediff,score from y
But I get the following errors:
select clientid,emailmessageid,null cnttrn,idatediff,null score from x
Error: unexpected symbol in "select clientid"
union all select
clientid,emailmessageid,cnttrn,idatediff,score from y
Error:
unexpected symbol in "union all"
Please help to correct it. Thank you.
dput(x)
ClientID EmailMessageId MinDate MaxDate IdSlip WwsCreatedDate ProductArticle ProductGroupName MainProductGroupName CategoryGroupName QtytItems SumAmount iDateDiff
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 580.0 -342
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 3190.0 -342
dput(y)
ClientID EmailMessageId CntTrn iDateDiff score
86139F31664463A8B7592B6887B731A9FC2C3489BB1756A5BF334CFDEA4EF604 9EDCC1391C208BA0 1 4 1
BD483D69913E3EBFE5FBA87A1FFAB7DCD061055FFB4342C2F27AC01F36833254 EF72D53990BC4805 1 5 1
0B3B2F06C3033B3AFD83BA59B405BCC79BC69801FD3B69931F117B8D754A80EB 9EDCC1391C208BA0 1 3 1
This runs without errors for me. The only difference is that the query is formatted. Is the result correct?
library(sqldf)
y <- read.table(text = "ClientID EmailMessageId CntTrn iDateDiff score
86139F31664463A8B7592B6887B731A9FC2C3489BB1756A5BF334CFDEA4EF604 9EDCC1391C208BA0 1 4 1
BD483D69913E3EBFE5FBA87A1FFAB7DCD061055FFB4342C2F27AC01F36833254 EF72D53990BC4805 1 5 1
0B3B2F06C3033B3AFD83BA59B405BCC79BC69801FD3B69931F117B8D754A80EB 9EDCC1391C208BA0 1 3 1", header = TRUE)
x <- read.table(header = TRUE, text = "ClientID EmailMessageId MinDate MaxDate IdSlip WwsCreatedDate ProductArticle ProductGroupName MainProductGroupName CategoryGroupName QtytItems SumAmount iDateDiff
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 580.0 -342
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 3190.0 -342")
sqldf("
SELECT
ClientId,
EmailMessageId,
null CntTrn,
iDateDiff,
null Score
FROM x
UNION ALL
SELECT
ClientId,
EmailMessageId,
CntTrn,
iDateDiff,
Score
FROM y")

SSAS DAX Not Ordering Correctly

can anyone explain why this statement is not ordering correctly please?
Sample Workbook:- http://1drv.ms/1TRizj8
Basic query:-
EVALUATE
SUMMARIZE(
Data
,'data'[item]
,"TotalAmount", Sum(Data[Amount])
)
Result:-
Item TotalAmount
Item1 3.95128609469091
Item2 4.24529815278904
Item3 4.19327473518058
Item4 4.11105035459714
Item5 4.41249125008144
Item6 4.17408171753715
Altered Query:-
EVALUATE
SUMMARIZE(
Data
,'data'[item]
,"TotalAmount", Sum(Data[Amount])
)
order by "TotalAmount"
Actual Result:-
Item TotalAmount
Item1 3.95128609469091
Item2 4.24529815278904
Item3 4.19327473518058
Item4 4.11105035459714
Item5 4.41249125008144
Item6 4.17408171753715
Expected:-
Item TotalAmount
Item1 3.951286095
Item4 4.111050355
Item6 4.174081718
Item3 4.193274735
Item2 4.245298153
Item5 4.41249125
Hopefully i'm missing something really obvious here... ultimately i just want to get a TOPN() based on the biggest sellers of my real data but whenever i try to order by it goes all squiffy :/
worked it out with fresh eyes this morning, needed square brackets around the TotalAmount(!)
Query:
EVALUATE
SUMMARIZE(
Data
,'data'[item]
,"TotalAmount", Sum(Data[Amount])
)
order by [TotalAmount]
Results:
Item TotalAmount
Item1 3.95128609469091
Item4 4.11105035459714
Item6 4.17408171753715
Item3 4.19327473518058
Item2 4.24529815278904
Item5 4.41249125008144
sigh
:)

How to express this query in SQL Server 2008

I have table called Reporting with following columns
OutletId CampaignId ItemId Qty
10 1 Item1 12
10 1 Item2 13
10 1 Item3 14
20 2 Item4 10
20 2 Item5 11
20 2 Item6 12
20 2 Item7 8
Now I want to retrieve the data in this format
when user select campaignId =1
OutletId CampaignId Item1 Item2 Item3
10 1 12 13 14
when user select CampaignId=2
OutletId CampaignId Item4 Item5 Item6 Item7
20 2 10 11 12 8
Here Items for campaign are not fixed
I think it is efficient in this way:
SELECT *
FROM
(
SELECT OutletId, CampaignId, ItemId, Qty
FROM Reporting) AS p
PIVOT
(
SUM(Qty)
FOR ItemId IN (SELECT ItemId FROM Reporting WHERE campaignId =1)
) as pvt
Comment: Here campaignId =1 or campaignId =2 or campaignId =... whatever u want
A possible solution would be:
SELECT *
FROM
(
SELECT OutletId, CampaignId,ItemId, Qty
FROM test) AS p
PIVOT
(
SUM(Qty)
FOR ItemId IN (Item1,Item2,Item3,Item4)
) as pvt
But obviously, as commented before is not very efficient because you don't always know the Items... you either redesign your table or if using PIVOT you can build a dynamic sql building pivot items previously.