Approaching SQL query building - sql

I'm unsure what method to use in creating a query. Each week I need to pull a Count of invoices grouped by status types and the most recent Invoice entered.
I have many vendor tables that store sales records and I create a report each week that pulls the following.
Select Invoice_status, COUNT(Invoice_status) As Total, Max(Invoice_date)
From VendorABCRecordsTable
Group By Invoice_Status
Results for each vendor
|Invoice_status| Total | column3
I run this for VendorABCRecordsTable, Vendor123RecordsTable, VendorXYZRecordsTable and copy paste the results to a spread sheet.
How would I write it so the results would come out
Vendor | Invoice_status | Total | column3
VendorABC |
Vendor123 |
VendorXYZ |

SELECT
'VendorABC' Vendor
,Invoice_status
,COUNT(Invoice_status) Total
,MAX(Invoice_date) MaxInvoiceDate
from VendorABCRecordsTable
group by Invoice_Status
UNION ALL SELECT
'Vendor123' Vendor
,Invoice_status
,COUNT(Invoice_status) Total
,MAX(Invoice_date) MaxInvoiceDate
from Vendor123RecordsTable
group by Invoice_Status
UNION ALL SELECT
'VendorXYZ' Vendor
,Invoice_status
,COUNT(Invoice_status) Total
,MAX(Invoice_date) MaxInvoiceDate
from VendorXYZRecordsTable
group by Invoice_Status
Note, UNION ALL, as there will be no duplicate rows. If it needs to be ordered somehow, add an ORDER BY clause to (only!) the last query. (And you can name it as "Column3", if necessary.)

Definitely not optimized schema. However, perhaps a UNION query will help you deal with this.
SELECT *, "ABC" AS Vendor FROM VendorABCRecordsTable
UNION SELECT *, "123" FROM Vendor123RecordsTable
UNION SELECT *, "XYZ" FROM VendorXYZRecordsTable;
In order to use the * wildcard, fields must be in same order and same data types and same number of fields in table design. Otherwise, explicitly list fields in each SELECT. There is a limit of 50 SELECT lines. First SELECT line defines field names and data types.
Now use that query in subsequent queries.
Cannot edit data via UNION query.
Could specify fields to do aggregate calcs and grouping in each SELECT line. However, building UNION with raw data provides a dataset that can serve as source for various manipulations of data. This UNION is essentially what properly designed table would be like.

Related

How to use LIMIT and IN together to have a default row in SQL?

I am exploring SQL with W3School page and I have this requirements where I need to limit the query to a certain number but also having a default row included with that limit.
Here I want a default row where the customer name is Alfreds, then grab the remaining 29 rows to complete the query regardless of what their name is.
I tried to look on other SO question but they are too complicated to understand and using different syntax.
What you are looking for is a specific order clause.
Try this
SELECT * FROM Customers order by (case when CustomerName in ('Alfreds Futterkiste') then 0 else CustomerId end) limit 30 ;
If you're going to have a default row in SQL you should really have that row in the table with a known primary key, and then UNION it onto your query:
--default row, that is always included as long as the table has a PK 1
SELECT *
FROM Customers
WHERE CustomerId = 1
UNION ALL
--other rows, a variable number of
SELECT *
FROM Customers
WHERE CustomerId <> 1 AND ...
LIMIT 30
The limit presented in this way applies to the result of the Union
If you ever want to do something where you're unioning together limited sets in other combinations you might want to look at eg a form like
(... LIMIT 2)
UNION ALL
(... LIMIT 28)
Use UNION to combine the two queries.
SELECT *
FROM Customers
WHERE CustomerName != 'Alfredo Futterkiste'
LIMIT 9
UNION
SELECT *
FROM Customers
WHERE CustomerName = 'Alfreo Futterkiste'

How to aggregate data stored column-wise in a matrix table

I have a table, Ellipses (...), represent multiple columns of a similar type
TABLE: diagnosis_info
COLUMNS: visit_id,
patient_diagnosis_code_1 ...
patient_diagnosis_code_100 -- char(100) with a value of ‘0’ or ‘1’
How do I find the most common diagnosis_code? There are 101 columns including the visit_id. The table is like a matrix table of 0s and 1s. How do I write something that can dynamically account for all the columns and count all the rows where the value is 1?
What I would normally do is not feasable as there are too many columns:
SELECT COUNT(patient_diagnostic_code_1), COUNT(patient_diagnostic_code_2),... FROM diagnostic_info WHERE patient_diagnostic_code_1 = ‘1’ and patient_diagnostic_code_2 = ‘1’ and ….
Then even if I typed all that out how would I select which column had the highest count of values = 1. The table is more column oriented instead of row oriented.
Unfortunately your data design is bad from the start. Instead it could be as simple as:
patient_id, visit_id, diagnosis_code
where a patient with 1 dignostic code would have 1 row, a patient with 100 diagnostic codes 100 rows and vice versa. At any given time you could transpose this into the format you presented (what is called a pivot or cross tab). Also in some databases, for example postgreSQL, you could put all those diagnostic codes into an array field, then it would look like:
patient_id, visit_id, diagnosis_code (data type -bool or int- array)
Now you need the reverse of it which is called unpivot. On some databases like SQL server there is UNPIVOT as an example.
Without knowing what your backend this, you could do that with an ugly SQL like:
select code, pdc
from
(
select 1 as code, count(*) as pdc
from myTable where patient_diagnosis_code_1=1
union
select 2 as code, count(*) as pdc
from myTable where patient_diagnosis_code_2=1
union
...
select 100 as code, count(*) as pdc
from myTable where patient_diagnosis_code_100=1
) tmp
order by pdc desc, code;
PS: This would return all the codes with their frequency ordered from most to least. You could limit to get 1 to get the max (with ties in case there are more than one code to match the max).

How to Select latest record in table of Azure SQL?

I have Azure SQL database. There are 100 rows in table.
I have columns CustomerName, SalesAmount, SalesTime in CustomerSales table.
'Nissan','20000','2021-11-19 17:00:27.9866667'
'Nissan','25000','2021-11-02 17:00:27.9866667'
'Tesla' ,'60000','2021-11-01 17:00:27.9866667'
...
I would like make select query which returns single row of latest like
'Nissan','20000','2021-11-19 17:00:27.9866667'
How?
Order the rows in descending order by sales time, and then get first returned row only.
For example:
select top 1 *
from t
order by SalesTime DESC

Group by question in SQL Server, migration from MySQL

Failed finding a solution to my problem, would love your help.
~~ Post has been edited to have only one question ~~-
Group by one query while selecting multiple columns.
In MySQL you can simply group by whatever you want, and it will still select all of them, so if for example I wanted to select the newest 100 transactions, grouped by Email (only get the last transaction of a single email)
In MySQL I would do that:
SELECT * FROM db.transactionlog
group by Email
order by TransactionLogId desc
LIMIT 100;
In SQL Server its not possible, googling a bit suggested to specify each column that I want to have with an aggregate as a hack, that couldn't cause a mix of values (mixing columns between the grouped rows)?
For example:
SELECT TOP(100)
Email,
MAX(ResultCode) as 'ResultCode',
MAX(Amount) as 'Amount',
MAX(TransactionLogId) as 'TransactionLogId'
FROM [db].[dbo].[transactionlog]
group by Email
order by TransactionLogId desc
TransactionLogId is the primarykey which is identity , ordering by it to achieve the last inserted.
Just want to know that the ResultCode and Amount that I'll get doing such query will be of the last inserted row, and not the highest of the grouped rows or w/e.
~Edit~
Sample data -
row1:
Email : test#email.com
ResultCode : 100
Amount : 27
TransactionLogId : 1
row2:
Email: test#email.com
ResultCode:50
Amount: 10
TransactionLogId: 2
Using the sample data above, my goal is to get the row details of
TransactionLogId = 2.
but what actual happens is that I get a mixed values of the two, as I do get transactionLogId = 2, but the resultcode and amount of the first row.
How do I avoid that?
Thanks.
You should first find out which is the latest transaction log by each email, then join back against the same table to retrieve the full record:
;WITH MaxTransactionByEmail AS
(
SELECT
Email,
MAX(TransactionLogId) as LatestTransactionLogId
FROM
[db].[dbo].[transactionlog]
group by
Email
)
SELECT
T.*
FROM
[db].[dbo].[transactionlog] AS T
INNER JOIN MaxTransactionByEmail AS M ON T.TransactionLogId = M.LatestTransactionLogId
You are currently getting mixed results because your aggregate functions like MAX() is considering all rows that correspond to a particular value of Email. So the MAX() value for the Amount column between values 10 and 27 is 27, even if the transaction log id is lower.
Another solution is using a ROW_NUMBER() window function to get a row-ranking by each Email, then just picking the first row:
;WITH TransactionsRanking AS
(
SELECT
T.*,
MostRecentTransactionLogRanking = ROW_NUMBER() OVER (
PARTITION BY
T.Email -- Start a different ranking for each different value of Email
ORDER BY
T.TransactionLogId DESC) -- Order the rows by the TransactionLogID descending
FROM
[db].[dbo].[transactionlog] AS T
)
SELECT
T.*
FROM
TransactionsRanking AS T
WHERE
T.MostRecentTransactionLogRanking = 1

SQL - Insert using Column based on SELECT result

I currently have a table called tempHouses that looks like:
avgprice | dates | city
dates are stored as yyyy-mm-dd
However I need to move the records from that table into a table called houses that looks like:
city | year2002 | year2003 | year2004 | year2005 | year2006
The information in tempHouses contains average house prices from 1995 - 2014.
I know I can use SUBSTRING to get the year from the dates:
SUBSTRING(dates, 0, 4)
So basically for each city in tempHouses.city I need to get the the average house price from the above years into one record.
Any ideas on how I would go about doing this?
This is an SQL Server approach, and a PIVOT may be a better, but here's one way:
SELECT City,
AVG(year2002) AS year2002,
AVG(year2003) AS year2003,
AVG(year2004) AS year2004
FROM (
SELECT City,
CASE WHEN Dates BETWEEN '2002-01-01T00:00:00' AND '2002-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2002,
CASE WHEN Dates BETWEEN '2003-01-01T00:00:00' AND '2003-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2003
CASE WHEN Dates BETWEEN '2004-01-01T00:00:00' AND '2004-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2004
-- Repeat for each year
)
GROUP BY City
The inner query gets the data into the correct format for each record (City, year2002, year2003, year2004), whilst the outer query gets the average for each City.
There many be many ways to do this, and performance may be the deciding factor on which one to choose.
The best way would be to use a script to perform the query execution for you because you will need to run it multiple times and you extract the data based on year. Make sure that the only required columns are city & row id:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
INSERT INTO <table> (city) VALUES SELECT DISTINCT `city` from <old_table>;
Then for each city extract the average values, insert them into a temporary table and then insert into the main table.
SELECT avg(price), substring(dates, 0, 4) dates from <old_table> GROUP BY dates;
Otherwise you're looking at a combination query using joins and potentially unions to extrapolate the data. Because you're flattening the table into a single row per city it's going to be a little tough to do. You should create indexes first on the date column if you don't want the database query to fail with memory limits or just take a very long time to execute.