How can I combine the results of 2 queries into 1 output? - sql

Summary of the project: I am trying to make a report on my local chevy dealership to gather their numbers online and when their sales numbers are low, I will strike and lowball them to get a good deal.
Here's what I am currently trying to do:
I have to use two separate queries that I am using to create cars sold report and cars gain report.
EX:
Model
# of vehicles sold
Chevrolet Silverado 1500 Custom
3
Chevrolet Silverado 1500 LT
22
Chevrolet Silverado 1500 LT (2FL)
5
Chevrolet Silverado 1500 LTZ
5
Table 2:
Model
# of vehicles added
Chevrolet Silverado 1500 Custom
4
Chevrolet Silverado 1500 LT
2
Chevrolet Silverado 1500 LT (2FL)
3
Chevrolet Silverado 1500 LTZ
7
Is there any way I can combine the two so instead of having 2 seperate tables I can just have 1?
I dont think union is what im looking for,
I cant get the
Select
(query 1)
(query 2)
to work
I dont know if what I am trying to do is achievable in SQL but I figured its worth a try.
Thanks!!!!

SELECT t1.MODEL,t1.[# of vehicles sold],t2.[# of vehicles added]
FROM TABLE1 t1
INNER JOIN TABLE2 t2
on t1.MODEL = t2.MODEL
Assuming you are using t-sql and MODEL values are unique. Those column names are not good.

Related

How do I join or concat 2 dataframes where I get a new column for each row where the left_on/right_on key is the same?

Given 2 dataframes:
DF1
ID
Name
123
Jim
456
Bob
DF2
record_id
model_year
make_desc
model_desc
vin
123
2008
Chevy
Tahoe
cvin
456
2020
Hyundai
Elantra
hvin
456
2018
Ford
F-150
fvin
I want to merge/join/groupby, not sure really such that the result is:
ID
Name
model_year1
make_desc1
model_desc1
vin1
123
Jim
2008
Chevy
Tahoe
cvin
456
Bob
2020
Hyundai
Elantra
hvin
model_year2
make_desc2
model_desc2
vin2
2008
Chevy
Tahoe
cvin
2018
Ford
F150
fvin
(the second table of results is just more columns from the first table, i couldnt figure out the markup)
so kind of like a join, I need to be able to join data on a value
but I want to add columns instead of adding rows, when there are multiple matches,
and the number of matches cant be known upfront so it could need to add 10 columns.
I tried a horizontal concat but it doesnt seem to match on value
I have also read up a bunch on groupby, but I can't get it.
any help would be appreciated.
Didnt fight a straigtfoward way. Please try as explained and coded below;
df3=pd.merge(df1,df2, how='left', on='ID')#Merge the two dfs
df3=df3.groupby(['ID','Name'])['JobCode'].unique().reset_index()# JobCode to list
df3[['JobCode','JobCode_x']]=pd.DataFrame(df3['JobCode'].tolist(), index= df3.index)#Create required columns
ID Name JobCode JobCode_x
0 123 Jim H1B None
1 456 Bob H1B H2B

pig script loop though calculate averages

I have data that will be run in pig using aws emr looks like. The columns are called model, year, units_sold, total_customers.
chevy 1900 1000 49
chevy 1901 73 92
chevy 1902 45 65
chevy 1903 300 75
ford 1900 35 12
ford 1901 777 32
ford 1902 932 484
ford 1903 33 15
What I am trying to do is calculate the average for every car type. the averages will be calculated by adding the sum of units_sold, divided by the sum of total_customers.
so the desired result would look like
chevy (1000+73+45+300) / (49+92+65+75) = 5.04
ford (35+777+932+33) / (12+32+484+15) = 3.27
in my script i have
A = *Step to load data*;
B = GROUP A by year;
C = results = FOREACH B GENERATE SUM(units_sold)/SUM(total_customers);
dump C;
This returns an incorrect result.How can I achieve results that look like
chevy 5.04
ford 3.27
Looks like you need to group by car type, not year. Also, you might need to cast to float if units_sold and total_customers are integers if you don't want a rounded result. Try:
B = GROUP A by model;
C = FOREACH B GENERATE (float)SUM(units_sold)/(float)SUM(total_customers);

I need the top x most recent (by SALEDT) rows grouped be neighborhood (NBHD)

I'm using microsoft access and I need a sql query to return the top x (40 in my case) most recent sales for each neighborhood (NBHD). My data looks something like this:
PARID PRICE SALEDT SALEVAL NBHD
04021000 140000 1/29/2016 11 700
04021000 160000 2/16/2016 11 700
04018470 250000 4/23/2015 08 701
04018470 300000 4/23/2015 08 701
04016180 40000 5/9/2017 11 705
04023430 600000 6/12/2017 19 700
And what I need is the top 40 most recent SALEDT entries for each NBHD, and if the same PARID would show up in that top 40 twice or more, I only want the most recent one. If the rows have the same PARID and the same SALEDT, I need the only most expensive one. For this small set of sample data, I would get:
PARID PRICE SALEDT SALEVAL NBHD
04021000 160000 2/16/2016 11 700
04023430 600000 6/12/2017 19 700
04018470 300000 4/23/2015 08 701
04016180 40000 5/9/2017 11 705
I get row 2 (as it has a later SALEDT than row 1), row 4 (as it has a higher PRICE than row 3, and row 5 and row 6. Hopefully that is clear. Also, I'm using MS access SQL to do this, but wouldn't be opposed to some VBA solution if that is easier. Thanks in advance.
Here you go:
select a.parid, max(a.price)price, a.saledt, a.saleval, a.nbhd from #table a join (
select parid, max(saledt) saledt from #table
group by parid ) b on a.parid=b.parid and a.saledt=b.saledt
group by a.parid, a.saledt, a.saleval, a.nbhd
order by a.nbhd
In MS Access, you can do the following to get the 40 most recent entries for each neighborhood:
select t.*
from t
where t.salesdt in (select top 40 t2.salesdt
from t as t2
where t2.nbhd = t.nbhd
order by t2.salesdt desc
);
Your additional constraints are rather confusing. I'm not sure I fully follow them because I don't know what the columns really refer to.

Finding duplicates using SQL

I am trying to write a query that will display all duplicates in a table.
I have a table, lets call it WORKERS. This table has multiple columns; the two I am focusing on are called SocialSecurityNbr and EmpNbr.
I would like the query to display all rows where
SocialSecNbr == SocialSecNbr
AND
EmpNbr != EmpNbr
Below I have an example of my data followed by what I want the output to show. (for the simplicity of this question I have only used 4 digits to represent the social security number)
ID EmpNbr SocialSecNbr EmpName
1 00001 9711 Smith,John
2 00002 5789 Harris, Greg
3 00001 9711 Smith,John
4 00003 4100 Thompson,Lisa
5 00004 1250 Fulton,Kyle
6 00005 3999 Harris, Amber
7 00004 1250 Fulton,Kyle
8 00007 1250 Morlan,Richard
9 00008 3999 Levy,Harold
What I would like to see as the output:
ID EmpNbr SocialSecurityNbr EmpName
5 00004 1250 Fulton,Kyle
6 00005 3999 Harris, Amber
7 00004 1250 Fulton,Kyle
8 00007 1250 Morlan,Richard
9 00008 3999 Levy,Harold
As you can see above all of the duplicate Social Security numbers are shown in the output, except for John Smith. In the actual table there are many instances where the same person is shown more than once, this is fine and I do not what to see this in the outcome.
I have searched online for information on how to do this but all I found was examples using "Count > 1". I'm thinking I need to use "Distinct" however I do not believe that I can apply that function to just one column.
At least for ms-sql joining table with itself will work:
select distinct w1.ID, w1.EmpNbr, w1.SocialSecNbr, w1.EmpName
from WORKERS w1
inner join WORKERS s2 on w1.SocialSecNbr = s2.SocialSecNbr
AND
w1.EmpNbr <> s2.EmpNbr
for other sql flavours it should work as well.
See sample at SqlFiddle
You can approach this using an exists clause:
select ID, EmpNbr, SocialSecurityNbr, EmpName
from workers w
where exists (select 1
from workers w2
where w2.SocialSecurityNbr = w.SocialSecurityNbr and
w2.EmpNbr <> w.EmpNbr
);
With an index on workers(SocialSecurityNbr, EmpNbr), this should be relatively efficient.
The query below will show you all rows in Workers where the SocSecurityNbr has "duplicates" (as defined by having multiple different EmpNames).
SELECT *
FROM Workers
WHERE SocSecurityNbr IN (
SELECT SocSecurityNbr
FROM Workers
GROUP BY SocSecurityNbr
HAVING COUNT(DISTINCT EmpName) > 1
)
You could easily modify this to change your definition of a "duplicate" - for example, if there are multiple different employee numbers.

MS Access, Excel, SQL, and New Tables

I'm just starting out with MS Access 2010 and have the following setup. 3 excel files: masterlist.x (which contains every product that I sell), vender1.x (which contains all products from vender1, I only sell some of these products), and vender2.x (again, contains all products from vender2, I only sell some of these products). Here's an example data collection:
masterlist.x
ID NAME PRICE
23 bananas .50
33 apples .75
35 nuts .87
38 raisins .25
vender1.x
ID NAME PRICE
23 bananas .50
25 pears .88
vender2.x
ID NAME PRICE
33 apples .75
35 nuts .87
38 raisins .25
49 kiwis .88
The vender lists get periodically updated with new items for sell and new prices. For example, vender1 raises the price on bananas to $.75, my masterlist.x would need to be updated to reflect this.
Where I'm at now: I know how to import the 3 excel charts into Access. From there, I've been researching if I need to setup relationships, create a macro, or a SQL query to accomplish my goals. Not necessarily looking for a solution, but to be pointed in the right direction would be great!
Also, once the masterlist.x table is updated, what feature would I use to see which line items were affected?
Update: discovered SQL /JOIN/ and have the following:
SELECT * FROM master
LEFT JOIN vender1
ON master.ID = vender1.ID
where master.PRICE <> vender1.PRICE;
This gives me the output (for the above scenario)
ID NAME PRICE ID NAME PRICE
23 bananas .50 23 bananas .75
What feature would instead give me:
masterlist.x
ID NAME PRICE
23 bananas .75
33 apples .75
35 nuts .87
38 raisins .25
Here is a heads up since you were asking for ideas to design. I don't really fancy your current table schema. The following queries are built in SQL Server 2008, the nearest syntax that I could get in sqlfiddle to MS Access SQL.
Please take a look:
SQLFIDDLE DEMO
Proposed table design:
vendor table:
VID VNAME
1 smp farms
2 coles
3 cold str
4 Anvil NSW
product table:
PID VID PNAME PPRICE
203 2 bananas 0.5
205 2 pears 0.88
301 3 bananas 0.78
303 3 apples 0.75
305 3 nuts 0.87
308 3 raisins 0.25
409 4 kiwis 0.88
masterlist:
ID PID MPRICE
1 203 0.5
2 303 0.75
3 305 0.87
4 308 0.25
Join queries can easily update your masterlist now. for e.g.:
When the vendor updates their prices for the fruits they provide you. Or when they stop supply on that product. You may use where clauses to add the conditions to the query as you desire.
Query:
SELECT m.id, p.vid, p.pname, p.pprice
FROM masterlist m
LEFT JOIN product p ON p.pid = m.pid
;
Results:
ID VID PNAME PPRICE
1 2 bananas 0.5
2 3 apples 0.75
3 3 nuts 0.87
4 3 raisins 0.25
Please comment. Happy to help you if have any doubts.