How to retrieve unique rows where multiple children that reference it exist for different types? - sql

SELECT * FROM Fruit
INNER JOIN Apple ON Fruit.Id = Apple.FruitId
WHERE Apple.Type = 1 AND Apple.Type = 3
I need to get unique rows of Fruit that have both Apples that are of type 1 AND 3. Apple.Type is considered unique, but I wouldn't think it matters though.
With these rows, this should return two rows with both Fruit #50 and #52. The most important part is the Fruit.Id, I don't need to return the Types, but just need to make sure every single Fruit returned has at least one Apple.Type = 1 and one Apple.Type = 3.
Apple { Id = 1, FruitId = 50, Type = 0 }
Apple { Id = 2, FruitId = 50, Type = 1 }
Apple { Id = 3, FruitId = 50, Type = 3 }
Apple { Id = 4, FruitId = 51, Type = 1 }
Apple { Id = 5, FruitId = 51, Type = 2 }
Apple { Id = 6, FruitId = 52, Type = 3 }
Apple { Id = 7, FruitId = 52, Type = 1 }
Apple { Id = 8, FruitId = 52, Type = 2 }
Fruit { Id = 50 }
Fruit { Id = 51 }
Fruit { Id = 52 }
I'm not quite sure how to use DISTINCT and/or GROUP BY in order to form this query.

Group your apples table by fruit id and pick the results that have both desired types. Use this to get your fruits.
SELECT *
FROM Fruit
WHERE id IN
(
SELECT FruitId
FROM Apple
WHERE Type IN (1,3)
GROUP BY FruitId
HAVING COUNT(DISTINCT Type) = 2
);

This would return the fruits with ID 50 and 52.
SELECT *
FROM Fruit
WHERE EXISTS (
SELECT 1 FROM Apple
WHERE Type = 1 AND Apple.FruitId = Fruit.Id
) AND EXISTS (
SELECT 1 FROM Apple
WHERE Type = 3 AND Apple.FruitId = Fruit.Id
)

Not the most efficient way, but transposing those columns out so you have multiple types per fruitid should do it.
create table type_1 as select FruitId, Type as Type1 from Apple where Type = 1;
create table type_3 as select FruitId, Type as Type3 from Apple where Type = 3;
create table Fruits as select distinct FruitId from Apple;
create table Fruit_Agg as select a.FruitId, b.Type1, c.Type3 from Fruits a left join type_1 b on a.FruitId = b.FruitId left join type_3 c on a.FruitId = c.FruitId;
create table Types_1and_3 as select FruitId from Fruit_Agg where Type1 = 1 and Type3 = 3;

Related

All possible unique combination of a single column value partitioned by groups

I have a table like the following in Google BigQuery. I am trying to get all possible unique combination(all subsets except the null subset) of the Item column partitioned on Group.
Group Item
1 A
1 B
1 C
2 X
2 Y
2 Z
I am looking for an output like the following:
Group Item
1 A
1 B
1 C
1 A,B
1 B,C
1 A,C
1 A,B,C
2 X
2 Y
2 Z
2 X,Y
2 Y,Z
2 X,Z
2 X,Y,Z
I have tried to use this accepted answer to incorporate Group to no avail:
How to get combination of value from single column?
Consider below approach
CREATE TEMP FUNCTION generate_combinations(a ARRAY<STRING>)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
var combine = function(a) {
var fn = function(n, src, got, all) {
if (n == 0) {
if (got.length > 0) {
all[all.length] = got;
} return;
}
for (var j = 0; j < src.length; j++) {
fn(n - 1, src.slice(j + 1), got.concat([src[j]]), all);
} return;
}
var all = []; for (var i = 1; i < a.length; i++) {
fn(i, a, [], all);
}
all.push(a);
return all;
}
return combine(a)
''';
with your_table as (
select 1 as _Group,'A' as Item union all
select 1, 'B' union all
select 1, 'C' union all
select 2, 'X' union all
select 2, 'Y' union all
select 2, 'Z'
)
select _group, item
from (
select _group, generate_combinations(array_agg(item)) items
from your_table
group by _group
), unnest(items) item
with output
Try this
with _data as
(
select 1 as _Group,'A' as Item union all
select 1 as _Group,'B' as Item union all
select 1 as _Group,'C' as Item union all
select 2 as _Group,'X' as Item union all
select 2 as _Group,'Y' as Item union all
select 2 as _Group,'Z' as Item
)
select distinct _Group ,Item from
(
select _Group,
Item
from _data
union all
select _Group,
string_agg(Item ,',') over(partition by _Group order by Item ) as item
from _data
union all
select a._Group ,
concat(a.item,',',b.item)
from _data a left join _data b on a._group = b._group and a.Item < b.Item
)
where item is not null
order by _group

Compare Tuples value present inside a bag with a hardcoded String value

I have a data set with these columns:-
FMID,County,WIC,WICcash
Here is a sample of data:-
1002267,Douglas,Y,N
21005876,Douglas,Y,N
1001666,Douglas,N,Y
I have grouped the data based on County and have filtered the data based on County = 'Douglas'. Here is the output:
(Douglas,{(1002267,Douglas,Y,N),(21005876,Douglas,Y,N),(1001666,Douglas,N,Y)})
Now if the WIC and WICcash columns have value as Y then I want to take the combine count of the values from both the columns.
Here, combining WIC and WICcash columns I have 3 Y values, so my output will be
Douglas 3
How can I achieve this?
Below is the code that I have written till now
load_data = LOAD 'PigPrograms/Markets/DATA_GOV_US_Farmers_Market_DataSet.csv' USING PigStorage(',') as (FMID:long,County:chararray, WIC:chararray, WICcash:chararray);
group_markets_by_county = GROUP load_data BY County;
filter_county = FILTER group_markets_by_county BY group == 'Douglas';
DUMP filter_county;
For looking inside a bag, you can use a nested-foreach.
A = LOAD 'input3.txt' AS (FMID:long,County:chararray, WIC:chararray, WICcash:chararray);
B = GROUP A by County;
describe B; /* B: {group: chararray,A: {(FMID: long,County: chararray,WIC: chararray,WICcash: chararray)}} */
C = FOREACH B {
FILTER_WIC_Y = FILTER A by WIC == 'Y';
COUNT_WIC_Y = COUNT(FILTER_WIC_Y);
FILTER_WICcash_Y = FILTER A by WICcash == 'Y';
COUNT_WICcash_Y = COUNT(FILTER_WICcash_Y);
GENERATE group, COUNT_WIC_Y + COUNT_WICcash_Y as count;
}
dump C;
Or, you can replace 'Y'&'N' into 1&0 and add them up.
A = LOAD 'input3.txt' AS (FMID:long,County:chararray, WIC:chararray, WICcash:chararray);
B = FOREACH A GENERATE FMID, County, (WIC == 'Y' ? 1 : 0 ) as wic, (WICcash == 'Y' ? 1 : 0 ) as wiccash;
C = GROUP B by County;
D = FOREACH C GENERATE group, SUM(B.wic) + SUM(B.wiccash) as count;
dump D;

How to get combination of value from single column?

I'm trying to get distinct possible combination value from single column in BigQuery.
Suppose i have this table:
+---------------------------------------------+
| date |type |payment |customer_no|status|
+---------------------------------------------+
|2019-01-02|Shirt |Cashless| 101|Cancel|
|2019-01-02|Jeans |Cashless| 133|OK |
|2019-01-02|Jeans |Cash | 102|OK |
|2019-01-02|Cap |Cash | 144|OK |
|2019-01-02|Shirt |Cash | 132|OK |
|2019-01-01|Jeans |Cash | 111|Cancel|
|2019-01-01|Cap |Cash | 141|OK |
|2019-01-01|Shirt |Cash | 101|OK |
|2019-01-01|Jeans |Cash | 105|OK |
I wanna take with rules:
Only status = 'OK'
No repetition in combination like Shirt, Jeans and Jeans, Shirt is unacceptable
Group for each payment and its combination (Cash, Cassless, Cash&Cashless)
With this code:
#standardSQL
SELECT date,
type,
COUNT(customer_no) as total_customer_per_order_type,
order_payment
FROM `blabla.order`
WHERE status = 'OK'
GROUP BY date, type , payment
ORDER BY date DESC, payment ASC
i just got total customer for single type
How to get table something like this:
http://imgur.com/7aECjpSl.png
Below is for BigQuery Standard SQL and answers just the exact question in the title of your post which is:
How to get combination of value from single column?
#standardSQL
CREATE TEMP FUNCTION test(a ARRAY<INT64>)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
var combine = function(a) {
var fn = function(n, src, got, all) {
if (n == 0) {
if (got.length > 0) {
all[all.length] = got;
} return;
}
for (var j = 0; j < src.length; j++) {
fn(n - 1, src.slice(j + 1), got.concat([src[j]]), all);
} return;
}
var all = [];
for (var i = 1; i < a.length; i++) {
fn(i, a, [], all);
}
all.push(a);
return all;
}
return combine(a)
''';
WITH types AS (
SELECT DISTINCT type, CAST(DENSE_RANK() OVER(ORDER BY type) AS STRING) type_num
FROM `project.dataset.order`
WHERE status = 'OK'
)
SELECT items, STRING_AGG(type ORDER BY type_num) types
FROM UNNEST(test(GENERATE_ARRAY(1,(SELECT COUNT(1) FROM types)))) AS items,
UNNEST(SPLIT(items)) AS pos
JOIN types ON pos = type_num
GROUP BY items
You can test, play with above using sample data from your questions as in below
#standardSQL
CREATE TEMP FUNCTION test(a ARRAY<INT64>)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
var combine = function(a) {
var fn = function(n, src, got, all) {
if (n == 0) {
if (got.length > 0) {
all[all.length] = got;
} return;
}
for (var j = 0; j < src.length; j++) {
fn(n - 1, src.slice(j + 1), got.concat([src[j]]), all);
} return;
}
var all = [];
for (var i = 1; i < a.length; i++) {
fn(i, a, [], all);
}
all.push(a);
return all;
}
return combine(a)
''';
WITH `project.dataset.order` AS (
SELECT '2019-01-02' dt, 'Shirt' type, 'Cashless' payment, 101 customer_no, 'Cancel' status UNION ALL
SELECT '2019-01-02', 'Jeans', 'Cashless', 133, 'OK' UNION ALL
SELECT '2019-01-02', 'Jeans', 'Cash', 102, 'OK' UNION ALL
SELECT '2019-01-02', 'Cap', 'Cash', 144, 'OK' UNION ALL
SELECT '2019-01-02', 'Shirt', 'Cash', 132, 'OK' UNION ALL
SELECT '2019-01-01', 'Jeans', 'Cash', 111, 'Cancel' UNION ALL
SELECT '2019-01-01', 'Cap', 'Cash', 141, 'OK' UNION ALL
SELECT '2019-01-01', 'Shirt', 'Cash', 101, 'OK' UNION ALL
SELECT '2019-01-01', 'Jeans', 'Cash', 105, 'OK'
), types AS (
SELECT DISTINCT type, CAST(DENSE_RANK() OVER(ORDER BY type) AS STRING) type_num
FROM `project.dataset.order`
WHERE status = 'OK'
)
SELECT items, STRING_AGG(type ORDER BY type_num) types
FROM UNNEST(test(GENERATE_ARRAY(1,(SELECT COUNT(1) FROM types)))) AS items,
UNNEST(SPLIT(items)) AS pos
JOIN types ON pos = type_num
GROUP BY items
with result
Row items types
1 1 Cap
2 2 Jeans
3 3 Shirt
4 1,2 Cap,Jeans
5 1,3 Cap,Shirt
6 2,3 Jeans,Shirt
7 1,2,3 Cap,Jeans,Shirt

Group By with 'HAVING' clause on slick+play

Imagine I have a SQL table grades which has amongst other fields, the name of the student and the result of the grade:
| student | grade |
|----------|:---------:|
| Harry | Good |
| Ron | Good |
| Harry | Average |
| Harry | Fail |
| Hermione | Excellent |
| Hermione | Excellent |
| Ron | Average |
| ..... | .... |
If I wanted to select all the students with at least two 'Excellent' and zero 'Fail' grades one could do:
select student
from grades
group by student
having
sum(case when grade = 'Excellent' then 1 else 0 end) >= 2 and
sum(case when grade = 'Fail' then 1 else 0 end)
How could I translate such a query into Slick?
On the documentation the 'Having' clause they give seems simpler.
gradesTables
.groupBy(._student)
.map{ case(student, group) => (student, ???)}
.filter(???)
.list
On a related note, why do I get an error with the following:
gradesTables
.groupBy(._student)
.map{ case(student, group) => (student, group.filter(_.grade == "Fail").length)}
.list
The error is:
slick.SlickTreeException: Cannot convert node to SQL Comprehension
The following code in Slick will generate the SQL you need:
val query: Query[(Rep[String], Rep[Option[Int]], Rep[Option[Int]]), (String, Option[Int], Option[Int]), Seq] =
grades.groupBy( _.student ).map{ case (student, group) =>
val groupList = group.map(_.grade)
val gradeExcel = groupList.map( grade =>
Case.If(grade === "Excellent").Then(1).Else(0) ).sum
val gradeFail = groupList.map( grade =>
Case.If(grade === "Fail").Then(1).Else(0) ).sum
(student, gradeExcel, gradeFail)
}.
filter( g => g._2 >= 2 && g._3 === 0 )
// ...
println("Generated SQL:\n" + query.result.statements)
// Generated SQL:
// List(
// select "STUDENT", sum((case when ("GRADE" = 'Excellent') then 1 else 0 end)),
// sum((case when ("GRADE" = 'Fail') then 1 else 0 end)) from "GRADES" group by "STUDENT"
// having (sum((case when ("GRADE" = 'Excellent') then 1 else 0 end)) >= 2) and
// (sum((case when ("GRADE" = 'Fail') then 1 else 0 end)) = 0)
// )
db.run(query.result.map(println))
// Vector((Hermione,Some(2),Some(0)))

How to compare and sum values of multiple periods using LINQ in VB.Net

I have the following example datatable:
Value1 Value2 Customer Product Date
100 50 1000 100 1.8.2010
50 20 1000 101 5.1.2010
200 60 1000 100 6.2.2011
180 100 1001 100 7.3.2010
500 700 1000 100 1.1.2010
300 300 1001 100 4.4.2011
250 600 1000 100 3.3.2011
And now the user should be able to compare multiple periods. In this example the user chose two periods: 1.1.2010 - 31.12.2010 and 1.1.2011 - 31.12.2011. The result of the example should be:
Customer Product SumValue1Period1 SumValue2Period1 SumValue1Period2 SumValue2Period2
1000 100 600 750 450 660
1000 101 50 20 0 0
1001 100 300 100 300 300
How can I do this?
Since you have known number of columns, you can group data by Customer and products and then take conditional sum from grouping and it will make different columns of the resultant query.
Please have a look at following LinqPad program. Sorry, I'm not familiar with VB.Net so I have coded it in C#, but you'll get the fair idea:
void Main()
{
var Period1Start = new DateTime(2010,1,1);
var Period1End = new DateTime(2010,12,31);
var Period2Start = new DateTime(2011,1,1);
var Period2End = new DateTime(2011,12,31);
List<Item> lst = new List<Item>
{
new Item{ Value1 = 100, Value2 = 50, Customer = 1000, Product = 100 , Date = new DateTime(2010,8,1)},
new Item{ Value1 = 50, Value2 = 20, Customer = 1000, Product = 101 , Date = new DateTime(2010,5,1)},
new Item{ Value1 = 200, Value2 = 60, Customer = 1000, Product = 100 , Date = new DateTime(2011,2,6)},
new Item{ Value1 = 180, Value2 = 100, Customer = 1001, Product = 100 , Date = new DateTime(2010,7,3)},
new Item{ Value1 = 500, Value2 = 700, Customer = 1000, Product = 100 , Date = new DateTime(2010,1,1)},
new Item{ Value1 = 300, Value2 = 300, Customer = 1001, Product = 100 , Date = new DateTime(2011,4,4)},
new Item{ Value1 = 250, Value2 = 600, Customer = 1000, Product = 100 , Date = new DateTime(2011,3,3)}
};
var grp = lst.GroupBy(x=>new{x.Customer, x.Product}).
Select(y=> new
{
Customer = y.Key.Customer,
Product = y.Key.Product,
SumValue1Period1 = y.Where(x=>x.Date >= Period1Start && x.Date<= Period1End).Sum(p=>p.Value1),
SumValue2Period1 = y.Where(x=>x.Date >= Period1Start && x.Date<= Period1End).Sum(p=>p.Value2),
SumValue1Period2 = y.Where(x=>x.Date >= Period2Start && x.Date<= Period2End).Sum(p=>p.Value1),
SumValue2Period2 = y.Where(x=>x.Date >= Period2Start && x.Date<= Period2End).Sum(p=>p.Value2)
});
Console.WriteLine(grp);
}
// Define other methods and classes here
public class Item
{
public int Value1{get;set;}
public int Value2{get;set;}
public int Customer{get;set;}
public int Product{get;set;}
public DateTime Date{get;set;}
}
Take a look at
http://msdn.microsoft.com/en-us/vbasic/bb737908
Specifically, the 'GroupBy - Nested' example. It shows using LINQ to group by 'customer's orders, first by year, and then by month.' You situation should be more straight forward since it's just the date range.