Combining data using an Inner Join


The inner join, usually referred to as a JOIN, selects all rows from the tables when there is a match found between the values of the field being compared upon. This type of query is mostly used to combine the data from different tables so that reports can be generated or an analysis can be made. But a misconception about the principle of join might lead to incorrect data result.

Following is an example on how a misconception about the join principle can generate faulty data, and another example on how to generate the acceptable output.


EXAMPLE DATABASE: Training2

DATA INPUT

The data input are city table and country table from the Training2.mdb database.

The city table contains the country code, city name and city population.



The country table consists the country code, country name and country population.



QUERY join1

SELECT *
FROM city, country;

The join1 query tries to combine the records of the country table and the city table, but the results returned are unacceptable.



DISCUSSION

The goal of the previous query is to combine the rows of the city table to the corresponding rows of the country table. But the query generated faulty result. It selects everything from both tables and join the records in a way such that, for each record in the city table, it will merge all the records in the country table. The output produced is an example of what is called a Cartesian product.

A misconception in the query is that all the records from both tables will be selected and the result will have the correct data expected from it. A factor that was neglected in the query is the specification on how those records must be combined. In the earlier definition of join, it is stated, that there should be matching values between the binding fields that are being compared upon. The condition on how data must be combined is always a requirement in making joins, otherwise, an erroneous result will be generated.


QUERY join2

SELECT *
FROM city, country
WHERE city.cnt_code = country.cnt_code

The join2 query tries to combine the corresponding records of the country table and the city table using the country code of both tables as its binding field. The binding field of both tables must have the same value for a join to take place. This JOIN query provides the expected combined records of the city and country tables.



DISCUSSION

The goal of the query to correctly combine data from the country table and the city table is obtained because of the WHERE condition added in the syntax. The condition is that the city.cnt_code should be the same as country.cnt_code. This query uses an implied join statement to combine data.

But there is another way on how data from those tables can be combined by explicitly using the INNER JOIN statement. This statement have the same output as the join2 query.

SELECT *
FROM city INNER JOIN country 
ON city.cnt_code=country.cnt_code;

In the query result, it can be observed that the binding field (cnt_code) have the table names preceding it. It is bacause the field names are the same for the two tables and SQL would like to indicate from what table that particular field was derived. For other fields, the table names are not included because they are unique fields for each table.


Last modified September 23, 2008 6:06 am