Skip to content

mikaelhg/ksoup

Repository files navigation

Ksoup: The Ultimate Kotlin DSL for JSoup 🚀

Release License Kotlin

Ksoup is an elegant Kotlin DSL wrapper for JSoup, designed to make web scraping and HTML parsing in Kotlin more intuitive, type-safe, and maintainable. Perfect for both simple extractions and complex scraping workflows.

✨ Features

  • Type-safe Kotlin DSL for HTML parsing
  • Seamless JSoup integration with enhanced Kotlin syntax
  • Reactive-style data extraction
  • Custom HTTP client support
  • Clean, maintainable scraping code
  • Lightweight with zero runtime dependencies (except JSoup)

💡 Usage Examples

Basic Extraction

import io.mikael.ksoup.KSoup

data class GitHubProfile(var username: String = "", var fullName: String = "")

val profile = KSoup.extract<GitHubProfile> {

    result { GitHubProfile() }
    
    url = "https://github.com/mikaelhg"

    userAgent = "Mozilla/5.0 Ksoup/1.0"

    headers["Accept-Encoding"] = "gzip"
    
    text(".p-name") { text, page ->
        page.fullName = text
    }
    
    element(".p-nickname") { el, page ->
        page.username = el.text()
    }
}

Custom HTTP Client

class CustomHttpClient : HttpClient { 
    /* Your implementation here */
}

val profile = KSoup.extract<GitHubProfile> {
    httpClient = CustomHttpClient()
    url = "https://github.com/mikaelhg"
    // Extraction logic...
}

🛠 Roadmap

  • Basic HTML extraction
  • Custom HTTP client support
  • Multi-page extraction
  • "Next page" iteration
  • Enhanced error handling (4xx/5xx responses)
  • Async support
  • Rate limiting utilities

📄 License

Ksoup is released under the Apache 2.0 License.