cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Arch_dbxlearner
by New Contributor III
  • 4252 Views
  • 5 replies
  • 1 kudos

How to get data from Splunk on daily basis?

I am finding the ways to get the data to Databricks from Splunk (similar to other data sources like S3, Kafka, etc.,). I have received a suggestion to use the Databricks add-on to get/put the data from/to Splunk. To pull the data from Databricks to S...

Get Started Discussions
Databricks add-on
Splunk
  • 4252 Views
  • 5 replies
  • 1 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 1 kudos

@Arch_dbxlearner  - could you please follow the post for more details.  https://community.databricks.com/t5/data-engineering/does-databricks-integrate-with-splunk-what-are-some-ways-to-send/td-p/22048  

  • 1 kudos
4 More Replies
Phani1
by Valued Contributor II
  • 562 Views
  • 1 replies
  • 0 kudos

Late file arrivals - Autoloader

 Hi All,I have a situation where I'm receiving various CSV files in a storage location.The issue I'm facing is that I'm using Databricks Autoloader, but some files might arrive later than expected. In this case, we need to notify the relevant team ab...

  • 562 Views
  • 1 replies
  • 0 kudos
Latest Reply
HaggMan
New Contributor III
  • 0 kudos

Well, Autoloader could work nicely with the notification event for arriving files. You could probably specify a window duration for your "on-time" arrivels and that could be your base check for on time. As files arrive they go to their window and whe...

  • 0 kudos
dipali_globant
by New Contributor II
  • 505 Views
  • 1 replies
  • 0 kudos

duplicate data published in kafka offset

we have 25k data which are publishing by batch of 5k.we are numbering the records based on row_number window function and creating batch using this.we have observed that some records like 10-20 records are getting published duplicated in 2 offset. ca...

  • 505 Views
  • 1 replies
  • 0 kudos
Latest Reply
agallard
Contributor
  • 0 kudos

Hi @dipali_globant,duplicate data in Kafka can arise in a batch processing scenario for a few reasons here’s an example of ensuring unique and consistent row numbering: from pyspark.sql import Window from pyspark.sql.functions import row_number wind...

  • 0 kudos
prabbalagilead
by New Contributor II
  • 1006 Views
  • 1 replies
  • 0 kudos

How do i find total number of input tokens to genie ?

I am calculating usage analytics for my work, where they use genie.I have given the following for my genie as definition:(1) instructions (2) example SQL queries (3) Within catalog, i went to those relevant table schema and added comments, descriptio...

  • 1006 Views
  • 1 replies
  • 0 kudos
Latest Reply
prabbalagilead
New Contributor II
  • 0 kudos

Or is there any set of tables and functions to determine the number of input and output tokens per query?

  • 0 kudos
carolpeixinho
by New Contributor
  • 975 Views
  • 1 replies
  • 0 kudos

Sharing Opportunities with Databricks

Hi everyone,I would like to talk to someone that could set up a process of deals sharing with Databricks following the GDPR.Thanks,Carol.

  • 975 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi,Can you pls share some more details on what you are looking for ?If you are trying to share the data to/from Databricks, you can use Delta sharing , Clean rooms option - these provide data sharing options with strong security & governance.or if yo...

  • 0 kudos
Zoraida
by New Contributor
  • 599 Views
  • 1 replies
  • 0 kudos

Databricks Destiny with Fivetrans best practices

Hello! we are trying to use Fivetran for ingesting different sources into the data lake so we will have multiple connectors. We would like to know what are the recommendations when selecting the SQL warehouses. Since the new serverless SQL warehouses...

  • 599 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi,To understand about the Databricks SQL Serverless cost, you can see here - https://www.databricks.com/product/pricing/databricks-sqlIn terms of comparison, Databricks is said to be the most cost efficient & high performant in the market amongst it...

  • 0 kudos
Phani1
by Valued Contributor II
  • 448 Views
  • 1 replies
  • 0 kudos

code vulnerabilities, code smells, and bugs

Hi Team, is there a way in Databricks to check for code vulnerabilities, code smells, and bugs?Note :Databricks native functionality only  

  • 448 Views
  • 1 replies
  • 0 kudos
Latest Reply
SathyaSDE
Contributor
  • 0 kudos

Hi,As far as I am aware, for security scanning/monitoring at Databricks account level, we have belowSAT - https://github.com/databricks-industry-solutions/security-analysis-toolhttps://www.databricks.com/trust/trusthttps://learn.microsoft.com/en-us/a...

  • 0 kudos
fridthoy
by New Contributor II
  • 1252 Views
  • 7 replies
  • 0 kudos

Cluster logs folder

Hi,  I can't see to find the cluster_logs folder, anyone that can help me find where the cluster logs are stored? Best regards

fridthoy_0-1729764812475.png
  • 1252 Views
  • 7 replies
  • 0 kudos
Latest Reply
fiff
New Contributor II
  • 0 kudos

Thank you for the help! I have enabled predictive optimization for unity catalog, thinking it would automatically preform VACCUM on the tables i have in my delta lake. With that in mind, I assumed VACCUM wouldn't require further attention.Would it be...

  • 0 kudos
6 More Replies
sujan1
by New Contributor II
  • 4590 Views
  • 1 replies
  • 1 kudos

Resolved! requirements.txt with cluster libraries

Cluster libraries are supported from version 15.0 - Databricks Runtime 15.0 | Databricks on AWS.How can I specify requirements.txt file path in the libraries in a job cluster in my workflow? Can I use relative path? Is it relative from the root of th...

  • 4590 Views
  • 1 replies
  • 1 kudos
Latest Reply
462098
New Contributor III
  • 1 kudos

To use the new "requirements.txt" feature in your cluster do the following:Change your cluster's "Databricks Runtime Version" to 15.0 or greater (example: "15.4 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)"). Navigate to the cluster's: "Libraries...

  • 1 kudos
ChristopherQ1
by New Contributor
  • 1294 Views
  • 1 replies
  • 0 kudos

Can we share Delta table data with Salesforce using OData?

Hello!I'm seeking recommendations for streaming on-demand data from Databricks Delta tables to Salesforce. Is OData a viable choice?Thanks.

  • 1294 Views
  • 1 replies
  • 0 kudos
Latest Reply
matthew_m
Databricks Employee
  • 0 kudos

Hi @ChristopherQ1, Salesforce has released a zero-copy connection that relies on the SQL Warehouse to ingest data when needed. I suggest you consider that instead of OData.   Matthew

  • 0 kudos
Mo_menzje
by New Contributor
  • 385 Views
  • 0 replies
  • 0 kudos

Paralellizing XGBoost Hyperopt run using Databricks

Hi there!I am implementing a classifier for classifying documents to their respective healthcare type.My current setup implements the regular XGBClassifier of which the hyperparameters are to be tuned on my dataset, which is done using Hyperopt. Base...

Mo_menzje_0-1729782213124.png Mo_menzje_1-1729784402354.png
  • 385 Views
  • 0 replies
  • 0 kudos
Phani1
by Valued Contributor II
  • 490 Views
  • 1 replies
  • 0 kudos

CDC for Unstructured data

Hi All,how we can handle CDC for unstructured data in Databricks. What are some best practices we should follow to make this work effectively?Regards,Phani

  • 490 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Phani1 ,Handling CDC for unstructured data—such as audio, images, or video files—in Databricks involves efficiently detecting and processing changes to these files as they occur.Here's how you can approach this:Use Databricks Autoloader: Autoload...

  • 0 kudos
mathijs-fish
by New Contributor III
  • 4384 Views
  • 5 replies
  • 3 kudos

Resolved! "with open" not working with Shared Access Cluster on mounted location

Hi All,For an application that we are building, we need a encoding detector/utf-8 enforcer. For this, we used the python library chardet in combination with "with open". We open a file from a mounted adls location (we use a legacy hive-metastore)When...

mathijsfish_1-1701785425743.png mathijsfish_2-1701785466668.png
Get Started Discussions
glob
Mount
os
with open
  • 4384 Views
  • 5 replies
  • 3 kudos
Latest Reply
nagND
New Contributor II
  • 3 kudos

Hi @mathijs-fish @Ayushi_Suthar  - I am having the same issue with shared cluster. I can see the list of PDF files on the mount using dbutils.fs.ls(mount_point), but when I am trying to read the PDF files using PyPDF, I am getting - FileNotFoundError...

  • 3 kudos
4 More Replies
nickneoners
by New Contributor II
  • 3058 Views
  • 5 replies
  • 1 kudos

Variables in databricks.yml "include:" - Asset Bundles

HI,We've got an app that we deploy to multiple customers workspaces. We're looking to transition to asset bundles. We would like to structure our resources like:  -src/ -resources/ |-- customer_1/ |-- job_1 |-- job_2 |-- customer_2/ |-- job_...

  • 3058 Views
  • 5 replies
  • 1 kudos
Latest Reply
Breno_Ribeiro
New Contributor II
  • 1 kudos

I have a similar use case. We have two different host for databricks, EU and NA. In some case we need to deploy a similar job in both hosts. To fix that, here how I did:- Into job folder I created different job files, each one for one host. In aditio...

  • 1 kudos
4 More Replies
trevormccormick
by New Contributor III
  • 766 Views
  • 3 replies
  • 0 kudos

Embed Dashboard - GraphQL Operation Not Authentic

I have added a domain to my list of approved domains for embedding dashboards from my Databricks instance. This domain hosts my Docusaurus site. When the page with the embedded dashboard loads, it makes some network requests to Databricks that are fa...

  • 766 Views
  • 3 replies
  • 0 kudos
Latest Reply
trevormccormick
New Contributor III
  • 0 kudos

is it possible that this is happening because the website is not HTTPS?

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels