BrandQL Scraper Improvements & Scoring

Hey folks, Christian here—the one-man band running BrandQL. As a solo developer and entrepreneur, I'm always tweaking things behind the scenes to make our tool more reliable and useful for you all. Today, I'm excited to share details on a recent update to our scrape engine and scoring system. This isn't just a minor tweak; it's a step toward making logo and brand asset retrieval even smarter and more comprehensive.

The Motivation Behind the Update

When I first built BrandQL, the scraper focused on the most obvious sources: standard image tags, favicons, and meta-linked logos. But as I dug deeper—listening to feedback from developers and product managers like you—I realized we were missing out on a ton of potential resources. Websites are getting more creative with how they embed logos, especially with modern web design trends. Inline SVGs buried in the HTML? Check. Background images defined in CSS that scream "logo"? Absolutely. And a few other sneaky spots I won't spoil here.

So, in this update, I've expanded the scraper's reach to include these additional sources. For websites, this means we're now pulling in user-entered (or rather, site-embedded) elements that were previously overlooked. The goal? To give you a fuller set of options when querying for a domain's brand assets via our API or frontend.

Challenges and Solutions

Of course, with great power comes... a flood of data. Opening up to more sources meant our results suddenly ballooned. We're talking way more images per domain—great in theory, but in practice, a lot of them were noise. Think social media icons, decorative graphics, or unrelated stock photos that somehow matched our initial filters.

This is where the real work came in. I had to overhaul the scoring system from the ground up. Previously, it was a simple heuristic based on file names, sizes, and positions on the page. Now, it's more sophisticated:

Relevance Filtering: Added new rules to weed out common irrelevant items, like tiny social badges (e.g., Twitter birds or Facebook thumbs) or generic placeholders. If it's under a certain dimension or matches known patterns for non-logo assets, it gets downgraded or filtered out entirely.
Priority Scoring: Boosted weights for indicators of "logo-ness." For example, inline SVGs in header sections score higher. CSS backgrounds that are prominently placed? They get a bump too. I also incorporated aspect ratio checks—logos tend to be square or rectangular in predictable ways—and even some basic content analysis to favor vector formats like SVG for scalability.
Domain-Specific Tweaks: Per-domain results are now more tailored. If a site uses a lot of CSS tricks, we prioritize those; for simpler sites, we stick to classics.

The end result? Cleaner, more prioritized lists. You won't have to sift through dozens of duds to find the gold. And yes, the average result set per domain is larger now—often 20-50% more complete—without the bloat.

What This Means for You

If you're an entrepreneur integrating BrandQL into your app, this means fewer misses when fetching logos dynamically. Developers? You'll appreciate the reduced noise in API responses, making your code leaner and your users happier.

I've tested this on hundreds of domains, from big tech sites to indie startups, and the improvements are noticeable. For instance, on a site heavy with CSS backgrounds, we now reliably pull the main logo that was previously invisible to our scraper.

Looking Ahead

This is just the beginning. Next on my roadmap: even finer-grained scoring with some ML-assisted classification (e.g. dark vs. light) and relevance checks (without overcomplicating things—I'm still solo here). Plus, image processing upgrades like auto-cropping, format conversions on the fly, and maybe even color palette extraction for full brand kits.

If you've been using BrandQL, these changes are live now—no updates required on your end. Check out the updated docs at brandql.com/docs for examples, or hit up our API with a fresh query to see the difference.

As always, your input drives this ship. Drop thoughts in the comments, or find me on X (@useBrandQL). Let's keep building better tools together.

Cheers,
Christian

↖ Back to overview

Updated Scrape Engine and Scoring System

The Motivation Behind the Update

Challenges and Solutions

What This Means for You

Looking Ahead