How do use SQL scripts in R - sql

I need to write a SQL query
Here are my tables
x <- read.csv("C:/Users/Admin/Downloads/Set 1-1.csv",sep=",",dec=".")
y <- read.csv("C:/Users/Admin/Downloads/Set 1-2 - Copy.csv",sep=",",dec=".")
y$score <- 1
I tried joining it
library("sqldf")
select clientid,emailmessageid,null cnttrn,idatediff,null score from x
union all select clientid,emailmessageid,cnttrn,idatediff,score from y
But I get the following errors:
select clientid,emailmessageid,null cnttrn,idatediff,null score from x
Error: unexpected symbol in "select clientid"
union all select
clientid,emailmessageid,cnttrn,idatediff,score from y
Error:
unexpected symbol in "union all"
Please help to correct it. Thank you.
dput(x)
ClientID EmailMessageId MinDate MaxDate IdSlip WwsCreatedDate ProductArticle ProductGroupName MainProductGroupName CategoryGroupName QtytItems SumAmount iDateDiff
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 580.0 -342
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 3190.0 -342
dput(y)
ClientID EmailMessageId CntTrn iDateDiff score
86139F31664463A8B7592B6887B731A9FC2C3489BB1756A5BF334CFDEA4EF604 9EDCC1391C208BA0 1 4 1
BD483D69913E3EBFE5FBA87A1FFAB7DCD061055FFB4342C2F27AC01F36833254 EF72D53990BC4805 1 5 1
0B3B2F06C3033B3AFD83BA59B405BCC79BC69801FD3B69931F117B8D754A80EB 9EDCC1391C208BA0 1 3 1

This runs without errors for me. The only difference is that the query is formatted. Is the result correct?
library(sqldf)
y <- read.table(text = "ClientID EmailMessageId CntTrn iDateDiff score
86139F31664463A8B7592B6887B731A9FC2C3489BB1756A5BF334CFDEA4EF604 9EDCC1391C208BA0 1 4 1
BD483D69913E3EBFE5FBA87A1FFAB7DCD061055FFB4342C2F27AC01F36833254 EF72D53990BC4805 1 5 1
0B3B2F06C3033B3AFD83BA59B405BCC79BC69801FD3B69931F117B8D754A80EB 9EDCC1391C208BA0 1 3 1", header = TRUE)
x <- read.table(header = TRUE, text = "ClientID EmailMessageId MinDate MaxDate IdSlip WwsCreatedDate ProductArticle ProductGroupName MainProductGroupName CategoryGroupName QtytItems SumAmount iDateDiff
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 580.0 -342
3E34C0C9FC05975CC0F01D7A3DEE73D022538FA04B17A0316178E090C04F84A8 894DB62F7B7A6ED2 31.08.2016 31.08.2016 4A19280A1164CF3F4A701EF9AE97A1F1084B611000B94C02 24.09.2015 item1 item2 item3 item4 1 3190.0 -342")
sqldf("
SELECT
ClientId,
EmailMessageId,
null CntTrn,
iDateDiff,
null Score
FROM x
UNION ALL
SELECT
ClientId,
EmailMessageId,
CntTrn,
iDateDiff,
Score
FROM y")

Related

SQL group-by largest

I have a table that has cooccurrence counts by objects that looks like the following:
col1 col2 count
item1 item2 3
item3 item2 1
item1 item4 1
item2 item3 2
I would like to do a group-by largest top n counts on col1, however if I do that on the above table since all object pairs are not available the result would be the following: (which is not correct)
col1 col2 count
item1 item2 3
item3 item2 1
item2 item3 2
If I swap the columns and then add them back to the same table this would be the result:
col1 col2 count
item1 item2 3
item3 item2 1
item1 item4 1
item2 item3 2
item2 item1 3
item2 item3 1
item4 item1 1
item3 item2 2
And the group by would yield: (the correct result)
col1 col2 count
item1 item2 3
item2 item1 3
item4 item1 1
item3 item2 2
What would the proper query be for producing this kind of group-by? Am I correct that the column would need to be swapped and concatenated or is there a better way to go about this? (I'm using Postgres)
In the above I am showing a group by top 1, for the sake of keeping the example simple, in reality this is a group by top 10
This answers the original version of the question.
I think you want:
select v.col1, v.col2, max(t.count)
from t cross join lateral
(values (col1, col2), (col2, col1)) v(col1, col2)
group by v.col1, v.col2;

How to divide by a value in columnA that correlates to another value in ColumnB

I am creating a SQL report to show the number of viles on hand, sales orders, PO etc.
My system has everything in the base units (mL), and I need to divide by the DefaultPurchasingUnit which is 11. How do I do that if this is from one table?
Item UnitSize Unit DefaultPurchasingUnit
====== ======== ============== =====================
Item1 1 mL vile
Item1 11 vile vile
Item1 693 box vile
You can use join or window functions:
select t.*,
t.unitsize / tdef.unitsize
from t left join
t tdef
on t.item = tdef.item and
t.DefaultPurchasingUnit = tdef.unit;
You can use a left join on the same table:
SELECT A.Item, A.UnitSize AS SizeInML, A.UnitSize / B.UnitSize AS DefaultUnitSize, B.Unit, B.DefaultPurchasingUnit
FROM itemTable A
LEFT JOIN itemTable B
ON A.Item = B.Item AND A.DefaultPurchasingUnit = B.Unit
I ran this in SQL and it gave the following result where default purchasing unit for Item1 is vile and for Item2 is box with the same values as provided in your question (1, 11, 693 for mL, vile, box respectively):
Item SizeInML DefaultUnitSize Unit DefaultPurchasingUnit
====== ======== =============== ======= =====================
Item1 1 0.0909 vile vile
Item1 11 1 vile vile
Item1 693 63 vile vile
Item2 1 0.0014 box box
Item2 11 0.0159 box box
Item2 693 1 box box

Updating table cells until reaching a predefined total

I'm trying to solve a question via SQL ...
Let's suppose this is my table:
NAME | ITEM1 | ITEM2 | ITEM3
AAA 1 2 1
BBB 2 1 3
CCC 3 2 1
DDD 3 1 2
EEE 1 3 1
Now, 1 and 2 are the values I have to keep stored in. The 3 value is the one I have to modify in every column ...now, this value must be changed to 1 until reaching a defined total, else it must be changed to 2.
For example : in the ITEM1 column, let's suppose I need to have three times the value 1. It means I should modify one of the two 3 present values with 1 (It doesn't matter which) and the other one with 2. And so on for all the remaining columns ...
Could you please help me finding a quick way to do this?
I dont know if this will work for your case, but it does what you want. The problem is size of the query (2 updates for each column).
obs. #teste is your table
declare #max_number_1_per_column int = 3
declare #coun_1 int
--assuming that you need to update all columns with the same rule (max number 1 in each column is 3, in this example)
--assuming name is unique
select #coun_1 = count(*) from #teste where item1 = 1
if (#max_number_1_per_column - #coun_1 > 0)
update #teste
set item1 = 1
where name in (select top(#max_number_1_per_column - #coun_1) name from #teste where item1 = 3)
update #teste
set item1 = 2
where item1 = 3
---other column
select #coun_1 = count(*) from #teste where item2 = 1
if (#max_number_1_per_column - #coun_1 > 0)
update #teste
set item2 = 1
where name in (select top(#max_number_1_per_column - #coun_1) name from #teste where item2 = 3)
update #teste
set item2 = 2
where item2 = 3
---other column
select #coun_1 = count(*) from #teste where item3 = 1
if (#max_number_1_per_column - #coun_1 > 0)
update #teste
set item3 = 1
where name in (select top(#max_number_1_per_column - #coun_1) name from #teste where item3 = 3)
update #teste
set item3 = 2
where item3 = 3

How to get total record count for records that match 3 different criteria in single query

I have a table similar to this
Item1 Item2
yes yes
yes no
yes yes
yes yes
etc., etc.
I need to get the count of the records that have both Item1 & Item2. And also the counts for records that just have Item1 or Item2 and not have duplicate records in the final query. Any suggestions will be greatly appreciated as always.
Perhaps you just want group by:
select item1, item2, count(*)
from t
group by item1, item2;
If you specifically want to combine values, you could do:
select sum(case when item1 = 'yes' and item2 = 'yes' then 1 else 0 end) as two_yesses,
sum(case when (item1 = 'yes' or item2 = 'yes') and item1 <> item2
then 1 else 0
end) as one_yes
from t;

How to express this query in SQL Server 2008

I have table called Reporting with following columns
OutletId CampaignId ItemId Qty
10 1 Item1 12
10 1 Item2 13
10 1 Item3 14
20 2 Item4 10
20 2 Item5 11
20 2 Item6 12
20 2 Item7 8
Now I want to retrieve the data in this format
when user select campaignId =1
OutletId CampaignId Item1 Item2 Item3
10 1 12 13 14
when user select CampaignId=2
OutletId CampaignId Item4 Item5 Item6 Item7
20 2 10 11 12 8
Here Items for campaign are not fixed
I think it is efficient in this way:
SELECT *
FROM
(
SELECT OutletId, CampaignId, ItemId, Qty
FROM Reporting) AS p
PIVOT
(
SUM(Qty)
FOR ItemId IN (SELECT ItemId FROM Reporting WHERE campaignId =1)
) as pvt
Comment: Here campaignId =1 or campaignId =2 or campaignId =... whatever u want
A possible solution would be:
SELECT *
FROM
(
SELECT OutletId, CampaignId,ItemId, Qty
FROM test) AS p
PIVOT
(
SUM(Qty)
FOR ItemId IN (Item1,Item2,Item3,Item4)
) as pvt
But obviously, as commented before is not very efficient because you don't always know the Items... you either redesign your table or if using PIVOT you can build a dynamic sql building pivot items previously.