lundi 20 avril 2015

Conditionally removing duplicates in Excel based on highest value in a Column

I have a 2-column file in excel that that looks like this: The first column is a value, and the Second column contains a corresponding word. However, I need to remove all of the duplicates for which their value in Column 1 is not the highest. Essentially, cleaning up this data sheet, that has millions of entries.

-2  cat
-2  cat
-1  cat
-3  dog
-2  dog
-1  dog
-2  milk
-1  milk
-2  juice
-1  juice

I want to remove all duplicates in Column 2 that do not have the highest value in Column 1. The columns are ordered firstly by Column 2 (therefore the words are in alphabetical order: A-Z) and secondly by Column 1 (from highest to lowest, therefore the highest value is always the first value for each word).

I have been trying to do this using the advanced filtering in Excel for Mac 2011, but I cannot seem to include the condition of removing all duplicate entries except for the entry that has the highest value in Column 1.

The desired output is as follows:

-2  cat
-3  dog
-2  milk
-2  juice

I have been searching for answers based on the query conditionally removing duplicates, here and here, however each of the solutions provided seems to be based on a boolean condition (i.e. there is information or not), and not determining which value is the highest as the condition for which to remove the duplicate or not.

Any ideas on how to go about solving this?

Aucun commentaire:

Enregistrer un commentaire