cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

fijoy
by Contributor
  • 1629 Views
  • 3 replies
  • 0 kudos

Is there a utility to convert between "/dbfs" and "dbfs:" path strings?

Is there a built-in utility function, e.g., dbutils, that can convert between path strings that start with "dbfs:" and "/dbfs"?Some operations, e.g, copying from one location in DBFS to another using dbutils.fs.cp() expect the path starting with "/db...

  • 1629 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Fijoy Vadakkumpadan​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best a...

  • 0 kudos
2 More Replies
Jujiro
by New Contributor III
  • 6593 Views
  • 11 replies
  • 7 kudos

Random error: At least one column must be specified for the table?

I have the following code in a notebook. It is randomly giving me the error, "At least one column must be specified for the table." The error occurs (if at all it occurs) only on the first run after attaching to a cluster.Cluster details:Summary5-1...

dbr-bug
  • 6593 Views
  • 11 replies
  • 7 kudos
Latest Reply
Harold
New Contributor II
  • 7 kudos

Please check if this could help or not:spark.databricks.delta.catalog.update.enabled false

  • 7 kudos
10 More Replies
LidorAbo
by New Contributor II
  • 5425 Views
  • 1 replies
  • 1 kudos

bucket ownership of s3 bucket in databricks

We had a databricks job that has strange behavior,when we passing 'output_path' to function saveAsTextFile and not output_path variable the data saved to the following path: s3://dev-databricks-hy1-rootbucket/nvirginiaprod/3219117805926709/output_pa...

s3
  • 5425 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Valued Contributor
  • 1 kudos

I suspect you provided a dbfs path to save the data hence the data saved under your workspace root bucket.For the workspace root bucket, databricks workspace will interact with databricks credential to make sure databricks has access to it and able t...

  • 1 kudos
qwerty1
by Contributor
  • 1016 Views
  • 1 replies
  • 0 kudos

Unable to create bloom filter index

I am unable to create bloom filter index on my tableCREATE BLOOMFILTER INDEX ON TABLE my_namespace.foo FOR COLUMNS (id OPTIONS (fpp = 0.1, numItems = 6000000))Gives the errorAnalysisException: Table `spark_catalog`.`my_namespace`.`foo` did not specif...

  • 1016 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, You can refer to https://issues.apache.org/jira/browse/SPARK-27617 for the above error. Please let us know if this helps, also please tag @Debayan​ with your next response which will notify me, Thank you!

  • 0 kudos
gg_047320_gg_94
by New Contributor II
  • 7598 Views
  • 1 replies
  • 1 kudos

DLT Spark readstream fails on the source table which is overwritten

I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons. df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('starting...

  • 7598 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 1 kudos

Hi, Could you please confirm DLT and DBR versions? Also please tag @Debayan​ with your next response which will notify me, Thank you!

  • 1 kudos
eyalo
by New Contributor II
  • 3618 Views
  • 6 replies
  • 0 kudos

Why the SFTP ingest doesn't work?

Hi, I did the following code but it seems like the cluster is running for a long period of time and then stops without any results. Attached my following code: (I used 'com.springml.spark.sftp' library and install it as Maven)Also i whitelisted my lo...

image
  • 3618 Views
  • 6 replies
  • 0 kudos
Latest Reply
eyalo
New Contributor II
  • 0 kudos

@Debayan Mukherjee​ Hi, I don't know if you got my reply so i am bouncing my message to you again.Thanks.

  • 0 kudos
5 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 2889 Views
  • 2 replies
  • 3 kudos

Resolved! Column is accessible after dropping the same column

Hi Today I have seen very Strang behavior of databricks.I have dropped one column from a dataframe and assigned the result to a new dataframe but I am able to use the dropped column in the filter command.In general scenario I should get an error but ...

image.png
  • 2889 Views
  • 2 replies
  • 3 kudos
Latest Reply
Sandeep
Contributor III
  • 3 kudos

@Ajay Pandey​ , this is a known behavior. Please refer this JIRA for details: https://issues.apache.org/jira/browse/SPARK-30421

  • 3 kudos
1 More Replies
KarenBT
by New Contributor III
  • 4413 Views
  • 15 replies
  • 4 kudos

Welcome 2023 Virtual hackathon participants, we're happy to have you! ✋  Please use this space to ask questions, we'll have some folks from Da...

Welcome 2023 Virtual hackathon participants, we're happy to have you! Please use this space to ask questions, we'll have some folks from Databricks and the community join to help out. We're really excited to see what you work on and if you have any ...

  • 4413 Views
  • 15 replies
  • 4 kudos
Latest Reply
sanggusti
New Contributor II
  • 4 kudos

Hi, I also have another question. Do we get any Databricks platform access for the period of hackathon? My company didn't use one and the trial is only 14 days. I'm pretty aware of the capability and since the hackathon are held by Databricks I think...

  • 4 kudos
14 More Replies
Michelle_-_Devp
by New Contributor III
  • 744 Views
  • 1 replies
  • 1 kudos

Resolved! How is brainstorming going?

Wondering if anyone is willing to share their project ideas here. It would be great to know how things are going and if anyone has a good open-source dataset they are willing to share.

  • 744 Views
  • 1 replies
  • 1 kudos
Latest Reply
bayang
New Contributor III
  • 1 kudos

Good, read their docs to get a lot of info to sharpen this hackathon

  • 1 kudos
IndihomeTV
by New Contributor
  • 756 Views
  • 1 replies
  • 0 kudos

Databricks to redash

We have an issued security in redash, if we used databrick as a connector to redash, Can you support us?https://www.databricks.com/blog/2020/06/24/welcoming-redash-to-databricks.html

  • 756 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Valued Contributor
  • 0 kudos

Hi @Probis Useetv​ Thank you for reaching out to us.Would you please elaborate your use case about the "issued security in redash" ?

  • 0 kudos
Ismail1
by New Contributor III
  • 1523 Views
  • 3 replies
  • 3 kudos

Resolved! Generating an Account console PAT token

I can't seem to find any documentation on generating an account console PAT token, Can anyone link me to it or guide me?

  • 1523 Views
  • 3 replies
  • 3 kudos
Latest Reply
fkseki
New Contributor III
  • 3 kudos

You can't create a Personal Access Token on account level to use REST APIs. If you want to use SCIM on account level, on the account console settings you'll find the user provisioning tab. In there you can generate de SCIM token. If you want to acces...

  • 3 kudos
2 More Replies
pantelis_mare
by Contributor III
  • 16537 Views
  • 31 replies
  • 15 kudos

Resolved! Repos configuration for Azure Service Principal

Hello community!I would like to update a repo from within my Azure DevOps release pipeline. In the pipeline I generate a token using a AAD Service Principal as recommended, and I setup the databricks api using that token.When I pass the databricks re...

  • 16537 Views
  • 31 replies
  • 15 kudos
Latest Reply
xiangzhu
Contributor II
  • 15 kudos

traditional PAT may have long lifespn, but the new SP feature uses an AAD token which should have a much shorter lifespqn, maybe around one hour, this could be a limiting factor.However, I haven't tested this yet, so these are merely hypotheses.​Neve...

  • 15 kudos
30 More Replies
Phani1
by Valued Contributor
  • 1870 Views
  • 2 replies
  • 1 kudos

Integration Dolly with Databricks

Hi Databricks Team,Could you please share any links /docs/Sample notebooks to integrate Dolly with Databricks, our aim is to generate SQL queries based on the free text and execute it via databricks cluster/SQL warehouse.

  • 1870 Views
  • 2 replies
  • 1 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 1 kudos

https://www.dbdemos.ai/demo.html?demoName=llm-dolly-chatbot is a good demonstration of Dolly (or really any LLM) for question answering. LLMs like this are not for SQL generation, but other LLMs are, like starcoderbase

  • 1 kudos
1 More Replies
sanjay
by Valued Contributor II
  • 1158 Views
  • 2 replies
  • 1 kudos

Resolved! How can I prioritize message in autoloader

Hi,I am using autoloader, it picks data from AWS S3 and stores in delta table. In case there are large number of messages, I like to process messages by priority. Is it possible to prioritize messages in autoloader.Regards,Sanjay

  • 1158 Views
  • 2 replies
  • 1 kudos
Latest Reply
sanjay
Valued Contributor II
  • 1 kudos

Thank you Sandeep. Other option is I can keep messages in 2 different folders in S3. Can autoloader read message from multiple folders

  • 1 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels