cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

syedmuhammedmeh
by New Contributor III
  • 4325 Views
  • 2 replies
  • 6 kudos

Resolved! Databricks Kafka Read Not connecting

I'm trying to read data from GCP kafka through azure databricks but getting below warning and notebook is simply not completing. Any suggestion please? WARN NetworkClient: Consumer groupId Bootstrap broker rack disconnectedPlease note I've properly c...

  • 4325 Views
  • 2 replies
  • 6 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 6 kudos

Could you share the full error stack trace from your driver's logs? This is a Warning message, we need to take a look at the error level messages.

  • 6 kudos
1 More Replies
antoniok
by New Contributor II
  • 4875 Views
  • 1 replies
  • 3 kudos

dbutils.fs.ls is giving "null uri host This can be caused by unencoded / in the password string"

I'm trying to list number of files in s3 bucket. I've initially used "aws s3 ls <s3://>" to list the files and it worked. However, when trying to do the same using dbutils.fs.ls, I'm getting java.lang.NullPointerException: null uri host. This can be ...

  • 4875 Views
  • 1 replies
  • 3 kudos
Latest Reply
marcus1
New Contributor III
  • 3 kudos

You might be encountering an issue with bucket naming. Which I'm also getting with a bucket named something.[0-9]https://issues.apache.org/jira/browse/HADOOP-17241

  • 3 kudos
Lizzz
by New Contributor II
  • 4823 Views
  • 1 replies
  • 3 kudos

Resolved! Forward Spark structured streaming metrics to Datadog

We have a spark streaming application written in Pyspark that we'd like to monitor with Datadog. By default, datadog collects a couple of streaming metrics like 'spark.structured_streaming.processing_rate' and 'spark.structured_streaming.latency'. Ho...

  • 4823 Views
  • 1 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

@Liz Zhang​ , Please refer to the below documentation contain pyspark implementation of streamingQueryListener https://www.databricks.com/blog/2022/05/27/how-to-monitor-streaming-queries-in-pyspark.html

  • 3 kudos
fhte
by New Contributor
  • 2789 Views
  • 2 replies
  • 0 kudos

How to install R GeoLift library on Databrickts

Hi, I am having problems installing the GeoLift library. I am proceeding as per the official instructions: https://facebookincubator.github.io/GeoLift/docs/GettingStarted/InstallingRThis is what I run in the notebook:1) I install this particular vers...

Screenshot 2022-09-14 at 08.59.09
  • 2789 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Ludmila Kuncarova​,I would like to share the following link to our docs https://docs.databricks.com/libraries/notebooks-r-libraries.html in this link you will be able to find more details on how to install R libraries.

  • 0 kudos
1 More Replies
Yuliya
by New Contributor II
  • 2948 Views
  • 2 replies
  • 3 kudos

Azure Databricks SQL Warehouse connection issue

When trying to start SQL Warehouse from my Azure pay-as-you-go subscription, I'm getting error about not enough vCPUs provisioned. Documentation says to increase quota at Azure portal - but it requires knowing type of vCPUs to provision. What type of...

  • 2948 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Yuliya Quintela​,Just a friendly follow-up. Did Rostislaw's response help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 3 kudos
1 More Replies
Frank
by New Contributor III
  • 14224 Views
  • 9 replies
  • 2 kudos

SQLAlchemy ORM Connection String Error

We tried to insert records to Delta table using ORM. It looks like only SQLAlchemy has option to connect to Delta table.We tried the following codefrom sqlalchemy import Column, String, DateTime, Integer, create_engine   engine = create_engine("data...

  • 14224 Views
  • 9 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 2 kudos

Hi @Frank Zhang​ , Please disregard the driver comment. The Python SQL Connector requires no driver. Just a pip install and you are good to go. The links you provided don't actually show a working example of using SQL Alchemy's ORM to connect to Data...

  • 2 kudos
8 More Replies
KrishZ
by Contributor
  • 2057 Views
  • 2 replies
  • 0 kudos

Where to report a bug with Databricks ?

I have in issue in Pyspark.Pandas to report. Is there a github or some forum where I can register my issue?Here's the issue

  • 2057 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Krishna Zanwar​ Does @Debayan Mukherjee​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
PriyaTech
by New Contributor
  • 4869 Views
  • 1 replies
  • 2 kudos

Resolved! Converting Dataframe into Nested xml

e.g.dataframe is having firstname,lastname,middlename,id,salaryI need to convert dataframe in xml file but in nested format.output as nested xml<Name>    <firatname> <middlename>    <lastname>    </Name><id></id><salary></salary>Anyone has ides ho...

  • 4869 Views
  • 1 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

databricks has a xml connector:https://docs.databricks.com/data/data-sources/xml.htmlBasically you just define a df with the correct structure and write it to xml.To create a nested df, here you can find some info.

  • 2 kudos
LearningDatabri
by Contributor II
  • 8978 Views
  • 8 replies
  • 9 kudos

repos issue

Why repos works on one workspace and doesn't on another workspace? both have repos enabled.

  • 8978 Views
  • 8 replies
  • 9 kudos
Latest Reply
Prabakar
Databricks Employee
  • 9 kudos

Do you see any errors or what is the issue that you are facing? Could you please describe more about this problem?

  • 9 kudos
7 More Replies
Abhijeet
by New Contributor III
  • 3336 Views
  • 3 replies
  • 6 kudos

Resolved! Streaming pipeline orchestration

For a batch job I can use ADF and Databricks notebook activity to create a pipeline.Similarly what Azure stack I should use to run Structured streaming Databricks notebook for a production ready pipeline.

  • 3336 Views
  • 3 replies
  • 6 kudos
Latest Reply
Abhijeet
New Contributor III
  • 6 kudos

ok Sure

  • 6 kudos
2 More Replies
Frank
by New Contributor III
  • 6477 Views
  • 1 replies
  • 2 kudos

Resolved! Serverless or Managed

We have about 12k write/s and 1.5TB/mo compressed S3 data. How can we choose between Serverless vs managed? And what will be good way to project the cost? In serverless, how the machine and hours scaled or scheduled based on the load? If there is a l...

  • 6477 Views
  • 1 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

Hi @Frank Zhang​ How can we choose between Serverless vs managed? And what will be good way to project the cost? -- Once you enable the serverless feature on your workspace, by default the new warehouse will be created with a serverless option. If yo...

  • 2 kudos
Monika8991
by New Contributor II
  • 3823 Views
  • 2 replies
  • 1 kudos

Getting spark/scala versioning issues while running the spark jobs through Jar

 We tried moving our scala script from standalone cluster to databricks platform. Our script is compatible with following version:Spark: 2.4.8 Scala: 2.11.12The databricks cluster has spark/scala following with version:Spark: 3.2.1. Scala: 2.121: we ...

  • 3823 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Monika Samant​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 1 kudos
1 More Replies
j_afanador
by Contributor II
  • 2379 Views
  • 1 replies
  • 2 kudos

Resolved! Badge not received for Databricks Lakehouse Fundamentals Accreditation

Hello!I cleared the assessment for Databricks Lakehouse Fundamentals Accreditationbut not received a badge. Kindly assist me with this

  • 2379 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Juan Afanador​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly.

  • 2 kudos
Maho
by New Contributor
  • 1949 Views
  • 1 replies
  • 1 kudos

Resolved! Lakehouse Fundamentals badge not received

Hi I have finished Lakehouse Fundamentals assessment, received my completion certificate but so far did not receive a badge for it. Would you be able to assist please?

  • 1949 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Maciej Oleksy​ Thank you for reaching out! Please submit a ticket to our Training Team here: https://help.databricks.com/s/contact-us?ReqType=training  and our team will get back to you shortly. 

  • 1 kudos
Trushna
by New Contributor II
  • 4546 Views
  • 3 replies
  • 0 kudos

How to restart Databricks Cluster at specific time?

Command available for restart but not at specific time.databricks clusters restart --cluster-id <>

  • 4546 Views
  • 3 replies
  • 0 kudos
Latest Reply
karthik_p
Databricks Partner
  • 0 kudos

@Trushna Khatri​ adding some more information to prabakar. can you please let me know what is actual need of starting cluster during specific time. usually if you criteria is to use for jobs go with job cluster. here cluster start when ever your job ...

  • 0 kudos
2 More Replies
Labels