AI Puts Data Analysis in Everyone's Hands. It Doesn't Make the Fundamentals Optional.

Jun 30

AI applied to the right problem, in the right place, with the right guardrails

I built a working data analysis tool for over 3,000 LinkedIn connections using Claude. A classifier, a relationship analyzer, and an AI-powered project matcher. Something that would have taken a small team weeks to design and build.

I built it in a fraction of that time, alone.

That is the real story of what AI is doing to data analysis right now. It is putting serious analytical capability in the hands of anyone willing to think clearly about the problem in front of them.

But here is what that story misses.

The tool works not because AI wrote the code. It works because the analytical thinking that went into designing it was sound. And that thinking, the kind that defines the right problem, chooses the right method, knows what the data can and cannot say, did not come from the model. It came from knowing what good data analysis looks like.

AI opens the door. But walking through it well still requires something AI cannot give you.

What I Built and Why It Matters

Connection Explorer has three layers, and the distinction between them matters.

The first layer is a classifier. Every connection in the export gets tagged across four dimensions: industry, function, seniority, and connection age. This uses a lookup table of around 150 well-known companies mapped explicitly to industries, combined with keyword matching on job titles. No model. Deterministic, auditable, consistent. The same connection classified the same way every time.

The second layer is a relationship analyzer. It parses LinkedIn message history to derive a signal for each connection: Strong, Active, Light, One-sided, or No messages. Again, no model. A clear metric applied consistently across the data.

The third layer is an AI Project Matcher. Describe a project or activity, and it surfaces the most relevant people from your filtered network, with a reason for each match. This is where Claude enters the picture as an active component of the output, not just a tool that helped write the code.

Three layers. Two of them require no AI at all.

Knowing which parts of a problem need AI and which do not is itself an analytical skill. It does not come with the model.

The Classifier: Domain Knowledge Encoded as Logic

Building the classification layer required decisions that no model could make on my behalf.

What categories matter? Not just the obvious ones. In the industrial sectors I work across, Maintenance and Reliability is a distinct professional function with its own vocabulary: turnaround management, asset integrity, condition monitoring, rotating equipment. It does not naturally fall out of a generic job-title keyword list. It required knowing the domain.

How do you handle ambiguity? Merck is two entirely different companies: Merck and Co., the US pharmaceutical giant, and Merck Group, the German chemicals and materials conglomerate. A generic classifier gets this wrong every time. Getting it right required knowing the difference and encoding that explicitly.

How do you prioritize competing signals? The tool uses company name lookup as the first pass, before keyword matching, because a known company is a stronger signal than a job title keyword. That is a design decision grounded in analytical judgment, not something the model suggested.

And critically: the tool lets users add their own keyword rules on top of the built-in classifier. Someone working in process safety can add HAZOP, LOPA, and functional safety as keywords that map to a Process Safety category. Someone in renewable energy can add IPP, BESS, and O&M to surface relevant contacts in their sector. The classifier provides the foundation. The user brings the domain knowledge that makes it relevant to their world.

This is what democratization looks like when it is done well. Not a black box that pretends to know your industry. A system that encodes good analytical practice and invites you to extend it with what you know.

AI can write the code for a classifier. It cannot decide what the categories should be, how to handle edge cases, or how your domain is organized. That requires you.

The Analyzer: Knowing What Your Data Can and Cannot Say

The relationship strength signal is derived from a simple, honest heuristic. More messages, more recent, genuinely two-way: stronger signal. Fewer messages, older, one-sided: weaker signal.

But the most important analytical decision here was not how to build the signal. It was knowing its limits.

Some of your closest professional relationships will show no LinkedIn message history at all. Those people reach you by phone, email, or in person. The signal reflects how you use LinkedIn as a communication channel, not the full picture of your network. Reporting the numbers without naming that boundary would be technically accurate and analytically misleading.

This is a fundamental of good data practice that predates AI by decades. Know what your data can measure. Be explicit about what it cannot. The tool states this directly so that users interpret the signal correctly.

AI did not teach me to do this. Experience with data did.

A tool that gives you numbers without telling you their limits is not doing analysis. It is producing the appearance of analysis. That distinction matters more, not less, as AI makes it easier to generate output.

The Matcher: Where AI Finally Earns Its Place

Once the classifier and analyzer have done their work, a genuinely hard problem remains. Given a project or activity, I am working on, who in this structured, filtered pool is actually relevant? That requires reading a description, interpreting intent, and matching it against hundreds of people with varied titles, functions, and industries.

That is a reasoning problem. That is where AI earns its place.

But even here, analytical thinking determines whether it works.

The naive approach is to send all 3,000 connections to the model at once. That produces a prompt of around 190,000 tokens. Small local models cannot process it. Even capable cloud models work slowly and expensively on unstructured data at that scale. The output, if it comes, is less useful than it should be.

The right approach is to filter first. Use the classifier output to narrow to a relevant industry or function. Use the relationship analyzer to surface contacts worth prioritizing. Then ask the AI to reason across that structured, scoped pool of perhaps 50 to 200 people.

The model does not get better at reasoning because you sent it more data. It gets better because you sent it the right data. Scoping the input before sending it to a model is a data analysis decision, not a technical one.

AI applied to well-structured, appropriately scoped data produces results that are faster, more focused, and easier to trust. That structure is your responsibility, not the model's.

Guardrails: Data Practice Is Not an Afterthought

One more fundamental that AI does not change: how you handle the data matters.

A tool that processes your professional network creates a real and legitimate trust question. The answer here is architectural. There is no server. There is no database. Your CSV is read by your browser, processed locally, and discarded when you close the tab. Nothing is transmitted or stored.

The one genuine exception is the AI Project Matcher. If you use the Claude API path, a compact summary of your currently visible connections is sent to Anthropic's API for that single request. This is stated clearly in the tool before you use it.

Responsible data handling is not a new idea. It is a foundational principle of good analytical practice. What AI has changed is that one person can now implement it properly without a compliance team or a data engineering department. The principle is old. The accessibility is new.

Data practice does not become optional because the tools got easier. If anything, it becomes more important as more people build and use analytical systems without formal training in how to handle data well.

Try It Yourself

Connection Explorer is a single HTML file. Open it in a browser, upload your LinkedIn data export, and your network becomes searchable and filterable in seconds.

The classifier tags every connection by industry, function, seniority, and connection age. The relationship analyzer scores each one based on message history. The custom keyword panel lets you add your own categories based on your domain. The AI Project Matcher is there when you have a specific project and want the model to reason across your filtered network.

It is free. It is open source. It requires no installation, no account, and no subscription. Anyone with a LinkedIn account can export their connections and use this tool today.

That is what AI putting data analysis in everyone's hands looks like. Not a model doing the thinking for you. A capable tool, built on sound analytical practice, accessible to anyone willing to engage with the problem clearly.

The fundamentals made it possible. AI made it fast. The combination made it useful.