Multiple query data in ms access - sql

I have a table in a accdb that consists of several columns. They include a social security number, several dates and monetary values. I am trying to query data in here ( there are over 600000 results in the accdb ) .
Social security number can appear once or several times in a database. The dates and the monetary values that are on the same line ( in a different column ) can be different, or not.
So let's say my table looks like this:
Ssn Date1 Date2 moneyvalue PostDate
123455 12-01-20 03-04-20 5.21 (A datettime value )
I am trying to do several things:
First I want to only select the ssn that appear at least twice in the database (or more).
From those results i want to only get the ones where date1 is equal to date2.
From those results i want to get the results where there are different values in moneyvalue per ssn.
I want to compare the moneyvalue from the ssn to the money value from the first time this ssn appears in the database ( so the one with the oldest datetime in postDate) and post this ssn if they moneyvalue is different.
Is this possible? How would i go on about this? I have to do this from within ms access sql window, i can't export the database to mssql as it is protected.
So to sum it up:
I want to retrieve all ssn that appear twice or more in the database, where date1 is equal to date2, and where the monetary value in record x does not match the monetary value in the ssn with the oldest postDate.

Your question suggests aggegation and multiple having clauses:
select ssn
from mytable
group by ssn
having
count(*) > 1
and sum(iif(date1 = date2, 1, 0)) > 1
and count(distinct moneyvalue) > 1
Another interpretation is a where clause on condition date1 = date2:
select ssn
from mytable
where date1 = date2
group by ssn
having
count(*) > 1
and count(distinct moneyvalue) > 1
However both queries are not equivalent, and my understanding is that the first one is what you asked for.

Related

Grouping and null values in column

Need some help in how to fix a problem.
Below is my input data. Here I am doing a group by based on name field. The query which I am currently used for grouping is given below.
select name from Table
group by name having count(distinct DOB)='1'
But the problem is that the above query won't fecth records if the DOB field is null for all records within a group.In case if I try to give some dummy value for DOB field, then It won't fetch the result for first two rows and if I didn't give the dummy value for it won't fecth the records in 3 and 4
I tried something like this, but it is wrong
select name from Table
group by name having count(distinct case when DOB is null then '9999-01-01' else DOB END)='1'
Could someone help here with some suggestions. My expected result is given below.
You can replace the logic with:
having min(dob) = max(dob) or
min(dob) is null
Depending on your data, count(distinct) can be relatively expensive, so this can actually be cheaper than using it.
You can use count(distinct). Just change the comparison value:
having count(distinct dob) <= 1

Get distinct information across many fields some of which are NULL

I have a table with just over 65 million rows and 140 columns. The data comes from several sources and is submitted at least every month.
I look for a quick way to grab specific fields from this data only where they are unique. Thing is, I want to process all the information to link which invoice was sent with which identifying numbers and it was sent by whom. Issue is, I don't want to iterate over 65 million records. If I can get distinct values, then I will only have to process say 5 million records as opposed to 65 million. See below for a description of the data and SQL Fiddle for a sample
If say a client submits an invoice_number linked to passport_number_1, national_identity_number_1 and driving_license_1 every month, I only want one row where this appears. i.e. the 4 fields have got to be unique
If they submit the above for 30 months then on the 31st month they send the invoice_number linked to passport_number_1, national_identity_number_2 and driving_license_1, I want to pick this row also since the national_identity field is new hence the whole row is unique
By linked to I mean they appear on the same row
For all fields its possible to have Null occurring at one point.
The 'pivot/composite' columns are the invoice_number and
submitted_by. If any of those aren't there, drop that row
I also need to include the database_id with the above data. i.e.
the primary_id which is auto generated by the postgresql database
The only fields that don't need to be returned are the other_column
and yet_another_column. Remember the table has 140 columns so don't
need them
With the results, create a new table that will hold this unique
records
See this SQL fiddle for an attempt to recreate the scenario.
From that fiddle, I'd expect a result like:
Row 1, 2 & Row 11: Only one of them shall be kept as they are exactly the
same. Preferably the row with the smallest id.
Row 4 and Row 9: One of them would be dropped as they are exactly the
same.
Row 5, 7, & 8: Would be dropped since they are missing either the
invoice_number or submitted_by.
The result would then have Row (1, 2 or 11), 3, (4 or 9), 6 and 10.
To get one representative row (with additional fields) from a group with the four distinct fields:
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
;
Note that it is unpredictable which row exactly is returned unless you specify an ordering (documentation on distinct)
Edit:
To order this result by id simply adding order by id to the end doesn't work, but it can be done by eiter using a CTE
with distinct_rows as (
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
-- ...
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
)
select *
from distinct_rows
order by id;
or making the original query a subquery
select *
from (
SELECT
distinct on (
invoice_number
, passport_number
, national_id_number
, driving_license_number
-- ...
)
* -- specify the columns you want here
FROM my_table
where invoice_number is not null
and submitted_by is not null
) t
order by id;
quick way to grab specific fields from this data only where they are unique
I don't think so. I think you mean you want to select a distinct set of rows from a table in which they are not unique.
As far as I can tell from your description, you simply want
SELECT distinct invoice_number, passport_number,
driving_license_number, national_id_number
FROM my_table
where invoice_number is not null
and submitted_by is not null;
In your SQLFiddle example, that produces 5 rows.

SQL - Insert using Column based on SELECT result

I currently have a table called tempHouses that looks like:
avgprice | dates | city
dates are stored as yyyy-mm-dd
However I need to move the records from that table into a table called houses that looks like:
city | year2002 | year2003 | year2004 | year2005 | year2006
The information in tempHouses contains average house prices from 1995 - 2014.
I know I can use SUBSTRING to get the year from the dates:
SUBSTRING(dates, 0, 4)
So basically for each city in tempHouses.city I need to get the the average house price from the above years into one record.
Any ideas on how I would go about doing this?
This is an SQL Server approach, and a PIVOT may be a better, but here's one way:
SELECT City,
AVG(year2002) AS year2002,
AVG(year2003) AS year2003,
AVG(year2004) AS year2004
FROM (
SELECT City,
CASE WHEN Dates BETWEEN '2002-01-01T00:00:00' AND '2002-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2002,
CASE WHEN Dates BETWEEN '2003-01-01T00:00:00' AND '2003-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2003
CASE WHEN Dates BETWEEN '2004-01-01T00:00:00' AND '2004-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2004
-- Repeat for each year
)
GROUP BY City
The inner query gets the data into the correct format for each record (City, year2002, year2003, year2004), whilst the outer query gets the average for each City.
There many be many ways to do this, and performance may be the deciding factor on which one to choose.
The best way would be to use a script to perform the query execution for you because you will need to run it multiple times and you extract the data based on year. Make sure that the only required columns are city & row id:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
INSERT INTO <table> (city) VALUES SELECT DISTINCT `city` from <old_table>;
Then for each city extract the average values, insert them into a temporary table and then insert into the main table.
SELECT avg(price), substring(dates, 0, 4) dates from <old_table> GROUP BY dates;
Otherwise you're looking at a combination query using joins and potentially unions to extrapolate the data. Because you're flattening the table into a single row per city it's going to be a little tough to do. You should create indexes first on the date column if you don't want the database query to fail with memory limits or just take a very long time to execute.

How to make multiple changes in a table at once in SQL Server

I have a table index which contains columns like accountno, opendate, name, address, etc.
I want to change the open date of some account numbers to a specific value.
How can I do this at once?
Meaning, I have to put the opendate of some account numbers (more than 100) to 01.01.1990.
But the account numbers are different. How can I do this in a single query?
Wouldn't something like this work?
UPDATE
MyIndexTable
SET
opendate = <desired date>
WHERE
accountno IN (
a1, a2, ..., a100
);
If your account numbers don't satisfy a closed formula, you have to write out all 100 of them anyway.
One way to do this is to use a comma separated list, or a table with the IDs of the records you would like to change (this only works if you are setting them all to the same value) Here is an example with a Comma seperated list:
Update index
SET opendate = '1/1/1900'
WHERE accountno IN (123, 456)
You can also load the IDs into a table then the query would be
update index
set opendate = '1/1/1900'
WHERE accountno IN (SELECT accountno from acctnos_table)
Again this only works if you are setting all the records to the same date.

How do I check if all posts from a joined table has the same value in a column?

I'm building a BI report for a client where there is a 1-n related join involved.
The joined table has a field for employee ID (EmplId).
The query that I've built for this report is supposed to give a 1 in its field "OneEmployee" if all the related posts have the same employee in the EmplId field, null if it's different employees, i.e:
TaskTrans
TaskTransHours > EmplId: 'John'
TaskTransHours > EmplId: 'John'
This should give a 1 in the said field in the query
TaskTrans
TaskTransHours > EmplId: 'John'
TaskTransHours > EmplId: 'George'
This should leave the said field blank
The idea is to create a field where a case function checks this and returns the correct value. But my problem is whereas there is a way to check for this through SQL.
select not count(*) from your_table
where employee_id = GIVEN_ID
and your_field not in ( select min(your_field)
from your_table
where employee_id = GIVEN_ID);
Note: my first idea was to use LIMIT 1 in the inner query, but MYSQL didn't like it, so min it was - the points to use any, but only one. Min should work, but the field should be indexed, then this query will actually execute rather fast, as only indexes would be used (obviously employee_id should also be indexed).
Note2: Do not get too confused with not in front of count(*), you want 1 when there is none that is different, I count different ones, and then give you the not count(*), which will be one if count is 0, otherwise 0.
Seems a job for a window COUNT():
SELECT
…,
CASE COUNT(DISTINCT TaskTransHours.EmplId) OVER () WHEN 1 THEN 1 END
AS OneEmployee
FROM …