BQ query, not exists/joins - google-bigquery

in a BigQuery I have a table.
VM
Package
VM1
A
VM2
B
VM3
A
VM4
B
VM1
B
VM2
C
VM3
B
VM4
C
How can I get results, so all distinct VMs would be listed, but having Package column with value null (or empty, or Yes and No) if particular package not exists. I.e. I need to be listed all VMs (without duplicating), which have the package A installed, and the rest with value let say null:
VM
Package
VM1
A
VM2
null
VM3
A
VM4
null

You should check COUNT of package A for each VM and apply condition on the COUNT
SELECT VM, IF(COUNTIF(Package = 'A') = 0, NULL, 'A') AS Package
FROM table1
GROUP BY VM

Another option
select VM,
if('A' in unnest(array_agg(Package)), 'A', null) Package
from your_table
group by VM
with output

Related

How to add weights for records to find to most matching results

I'm developing a web application using ASP.net web forms C# and SQL service for the database. I've three main tables:
Services: ( more than 10K records)
Service_Id Service_Description
1 Clean
2 Oil Change
3 Fluid Services
4 Filter Replacement
the other table for customer requests
Customer_Requests:
Customer_ResuestId Customer_RequstedServices
1 1
1 2
1 3
1 4
2 4
2 5
the third main table is branches ( I've like 500 branches; each branch offers specific services )
Branches:
Branch_Id Branch_AvailabeServices
1 1
1 2
1 3
2 1
2 2
2 3
2 4
My question, How can I add weights for each Customer_RequstedServices and to find to most matching results. For example, I want the results to be something like that:
Customer request #1
has 4 service requests ( 25% for each request )
Branch #2 offers 100% of his requests, and Branch #1 offers 75% of his requests.
I want to show branch #2 at first as it offers all customers requirements.
I was able to get the weight by using:
Select count(Customer_ResuestId) as ServiceCount from Customer_Requests
then I can do 100/ServiceCount to get the weights for each record.
but I do not know how can I find that what services can be offered by each branch for each specific request..
Any help would be really appreciated. If the weights something that can not/difficult to be measured, then finding the branch the offers the most requested service would be really great and acceptable.
This should do it:
SELECT
Customer_ResuestId,
(Select Count(*) From Customer_Requests AS CRc
WHERE CR.Customer_ResuestId= CRC.Customer_ResuestId) as RequestedServices,
Branch_Id,
Count(*) as MatchingServicesOffered
FROM CustomerRequests CR
INNER JOIN Branches B
ON B.Branch_AvailabeServices = CR.Customer_RequstedServices
GROUP BY Customer_ResuestId,
Branch_Id
ORDER BY Count(*) DESC

Oracle - Tree like query

there is a table: Groups
It has three columns: ID, NAME, PARENT
This would flow in the following way.
Suppose there is a Group ELECTRONICS
Under ELECTRONICS, there is MOBILE
Under MOBILES there is SAMSUNG
Under SAMSUNG there is GALAXY EDGE
Under GALAXY EDGE there is 16GB and 8GB
the data in database would follow like:
ID NAME PARENT
1 ELECTRONICS null
2 MOBILE ELECTRONICS
3 SAMSUNG MOBILE
4 16GB SAMSUNG
5 8GB SAMSUNG
There may be N levels of hierarchy. I want to retrieve all the records of the last level.
In this case, return 16GB and 8GB.
The usual approach to such a problem is a recursive query. In Oracle this can be done using connect by.
However: to get all rows from the last level no recursive query is necessary.
Those are all rows that do not appear in the parent column:
select *
from Groups
where name not in (select parent
from groups g2
where g2.parent is not null);
SQLFiddle: http://sqlfiddle.com/#!4/df70d/1
A recursive query can be used to find e.g. all nodes below a certain category, e.g. if you want to find everything below SAMSUNG:
select *
from groups
start with name = 'SAMSUNG'
connect by prior name = parent;

Simple join between 3 tables takes lot of time in memsql

I ran the following query in memsql and mysql but the time taken by it is quite different.
Memsql
select count(*) from po A , cu B , tsk C where A.customer_id = B.customer_id and B.taskid = C.id and A.domain = 5 and week(B.post_date) = 22;
+----------+
| count(*) |
+----------+
| 98952 |
+----------+
1 row in set (19.89 sec)
Mysql
select count(*) from po A , cu B , tsk C where A.customer_id = B.customer_id and B.taskid = C.id and A.domain = 5 and week(B.post_date) = 22;
+----------+
| count(*) |
+----------+
| 98952 |
+----------+
1 row in set (0.50 sec)
Why Does memsql perform so badly while mysql is so fast?
Both mysql and memsql are on the same 8GB , quad core machine. memsql has 1 master Aggregator node and 3 leaf nodes.
Does memsql perform badly if there are joins?
UPDATE
From the Doc it is clear that the table should have a shard key on columns which are expected to join on often. This allows the optimizer to minimize network traffic during the execution of the query.
So i think here i went wrong. Instead of having a shard key i had added a simple primary key on the tables.
Have you tried running the query in MemSQL a second time?
MemSQL compiles and caches the query execution code the first time it sees a query - MemSQL calls it code generation.
http://docs.memsql.com/latest/concepts/codegen/
When you run the query again, you should see a considerable performance speedup.

Checking bidirectionality in SQL directed graph

In my Firebird database I have two tables, in the first one (LOCATIONS) I keep info about some locations (like in RPG game), which is very simple and looks like this:
NAME
Location 1
Location 2
Location 3
Location 4
Location 5
and the second table (CONNECTIONS) connects this locations so I know where I can go from which location. The case is that every row connects two locations in only one direction so if I want to create two-way connection I have to insert two rows into CONNECTIONS table.
Some example connections:
first_location|second_location
Location 1 | Location 2
Location 1 | Location 3
Location 2 | Location 1
Location 2 | Location 4
Location 3 | Location 1
Location 3 | Location 4
Location 4 | Location 5
Location 5 | Location 1
As you can see, these connections are representation of some directed graph.
I need to create a SQL query that will show me all locations (graph nodes) to which I can go from given loc/node and in additional, whether or not I can come back to the previous node.
The first part is easy, because
select second_location
from connections
where connections.first_location = 'Location 1'
gives me all nodes that are connected to Location 1, but the problem starts when I try to get info about bidirectionality of this connections.
so far I've tried something like this:
select c.first_location as first, c.second_location as second, p.count
from connections as c
where c.first = 'Location 1'
inner join (
select count(*)
from connections
where connections.first_location = c.second
and connections.second_location = 'Location 1'
) as p
and I hoped to get result like this:
first | second | count
Location 1 | Location 2 | 1
Location 1 | Location 3 | 0
but i was wrong. What should I do to solve this problem?
As originally posted in the question by the TS:
I don't need help anymore, somehow I solved this problem right after I posted this question.
SELECT p.first_location, p.second_location, (
SELECT COUNT (*)
FROM (
SELECT q.first_location, q.second_location
FROM connections AS c
WHERE p.first_location = q.second_location
and p.second_location = q.forst_location
)
)
FROM connections AS p
WHERE p.first_location = 'Location 1'
You can solve this with a self JOIN:
SELECT fwd.first, fwd.second, bck.first IS NOT NULL returnable
FROM connections fwd LEFT JOIN connections bck
ON (fwd.first=bck.second AND fwd.second=bck.first)
http://sqlfiddle.com/#!12/ec191/13/0

Matrix view generated from two selected columns of a table

Suppose i have a table containing Project_type, Project_No, OS_Platform columns. Here I have limited Project_types and limited OS_Platforms. I want a database view which produces a matrix between Project_type and OS_Platform.
MY_TABLE :
Project_Type Project_No OS_Platform
Drivers 345 Linux
WebService 453 Windows
Drivers 034 Windows
Drivers 953 Solaris
DesktopApp 840 Windows
WebService 882 Solaris
Now I have Project_type and OS_Platform as selected columns. I want a matrix view of these two columns with distinct rows and column names.
Project_Type Linux Windows Solaris
WebService null true true
Drivers true true true
DesktopApp null true null
Can anyone tell me if it is possible. How is that possible ?
You could also try using the dedicated PIVOT feature if it is supported by the SQL product you are using. For instance, the following would work in SQL Server 2005+:
SELECT *
FROM (
SELECT DISTINCT
Project_Type,
'true' AS flag,
OS_Platform
FROM MY_TABLE
) s
PIVOT (
MAX(flag)
FOR OS_Platform IN (
Linux, Windows, Solaris
)
) p
;
Oracle Database is another product that supports PIVOT, although I'm not sure in which version it was first introduced. You would be able to run the above query in Oracle after enclosing every column in the PIVOT's IN list in single quotes, like this:
... IN (
'Linux', 'Windows', 'Solaris'
)
...
This is basically a PIVOT query where you transpose your rows of data into columns. The easiest way to perform this since you want a true/null value is using an aggregate function and a CASE statement:
select project_type,
max(case when os_platform ='Linux' then 'true' else null end) Linux,
max(case when os_platform ='Windows' then 'true' else null end) Windows,
max(case when os_platform ='Solaris' then 'true' else null end) Solaris
from yourtable
group by project_type
See SQL Fiddle with Demo
The result is:
| PROJECT_TYPE | LINUX | WINDOWS | SOLARIS |
---------------------------------------------
| DesktopApp | (null) | true | (null) |
| Drivers | true | true | true |
| WebService | (null) | true | true |
You'll want to pivot/unpivot your values to transpose them into your format of choice.
Here's a google search for pivot on stack overflow. Any of these will do you fine. https://www.google.com/search?q=sql+pivot+unpivot+site%3Astackoverflow.com&oq=sql+pivot+unpivot+site%3Astackoverflow.com&aqs=chrome.0.57.9985&sugexp=chrome,mod=8&sourceid=chrome&ie=UTF-8
Now, there are 2 types of answers you'll see there. The first is a regular pivot/unpivot operation. These work very well (easily, not fast) with known data sets. That is, if you know all project types and platforms, this will work fine.
The second type is a dynamic pivot, or a pivot created by using dynamic SQL. This is messier, but allows you any combination of fields.
Good luck!