External Tables
However, managed tables are less convinent for sharing with other tools. For example, suppose we have data that is created and used primarily by Pig or other tools, but we want to run some quries against it, but not give Hive ownership of the data. So we can define an external table that points to that data, but doesn't take ownership of it.
Suppose we are analyzing data from the stock markets. Periodically, we ingest the data for NASDAQ and the NYSE from a source like Infochimps(http://infochimps.com/datasets).
Now the following table declaration creates an external table that can read all the data files for this comma-delimited data in /data/stocks:
hive(Economy)> create external table if not exists stocks(
> exchange string, symbol string, ymd string, price_open float, price_high float, price_low float,
> price_close float, volume int, price_adj_close float)
> row format delimited fields terminated by ','
> location '/data/stocks';
Because it's external, Hive doesn't assume it owns the data. Therefore, dropping the table doesn't delete the data, although the metadata for the table will be deleted.(sometimes permit denied);
In addtion, you can judge the table type between managed and external table using the output of 'hive>describe extended tablename'.
As for managed tables, you can also copy the schema(but the data) of an existign table:
hive> create external table if not exists Economy1
> like Economy
> location '/data/stocks/path';
What's more, if you omit the 'external' keyword and the original table is external, the new table will also be external; if you omit 'external' and the original table is managed, the new table will also be managed. However, if you include the external keyword and the original table is managed, the new table will be external.