cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

DanVartanian
by New Contributor II
  • 5254 Views
  • 3 replies
  • 0 kudos

Resolved! Help trying to calculate a percentage

The image below shows what my source data is (HAVE) and what I'm trying to get to (WANT).I want to be able to calculate the percentage of bad messages (where formattedMessage = false) by source and date.I'm not sure how to achieve this in DatabricksS...

havewant
  • 5254 Views
  • 3 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

you could use a windows function over source and date with a sum of messagecount. This gives you the total per source/date repeated on every line.Then apply a filter on formattedmessage == false and divide messagecount by the sum above.

  • 0 kudos
2 More Replies
SettlerOfCatan
by New Contributor
  • 2193 Views
  • 0 replies
  • 0 kudos

Access data within the blob storage without downloading

Our customer is using Azure’s blob storage service to save big files so that we can work with them using an Azure online service, like Databricks.We want to read and work with these files with a computing resource obtained by Azure directly without d...

blob-storage Azure-ML fileytypes blob
  • 2193 Views
  • 0 replies
  • 0 kudos
Azure_Data_Eng1
by New Contributor
  • 479 Views
  • 0 replies
  • 0 kudos

data=[['x', 20220118, 'FALSE', 3],['x', 20220118, 'TRUE', 97],['x', 20220119, 'FALSE', 1],['x'...

data=[['x', 20220118, 'FALSE', 3],['x', 20220118, 'TRUE', 97],['x', 20220119, 'FALSE', 1],['x', 20220119, 'TRUE', 49],['Y', 20220118, 'FALSE', 100],['Y', 20220118, 'TRUE', 900],['Y', 20220119, 'FALSE', 200],['Y', 20220119, 'TRUE', 800]]df=spark.creat...

  • 479 Views
  • 0 replies
  • 0 kudos
Soma
by Valued Contributor
  • 1933 Views
  • 3 replies
  • 2 kudos

Resolved! Query RestAPI end point in Databricks Standard Workspace

Do we have option to query delta table using Standard Workspace as a endpoint instead of JDBC

  • 1933 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@somanath Sankaran​ - Would you be happy to mark @Hubert Dudek​'s answer as best if it resolved the problem? That helps other members who are searching for answers find the solution more quickly.

  • 2 kudos
2 More Replies
MattM
by New Contributor III
  • 3067 Views
  • 4 replies
  • 4 kudos

Resolved! Schema Parsing issue when datatype of source field is mapped incorrect

I have complex json file which has massive struct column. We regularly have issues when we try to parse this json file by forming our case class to extract the fields from schema. With this approach the issue we are facing is that if one data type of...

  • 3067 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there, @Matt M​ - If @Hubert Dudek​'s response solved the issue, would you be happy to mark his answer as best? It helps other members find the solution more quickly.

  • 4 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 4202 Views
  • 9 replies
  • 3 kudos

Resolved! Tring to create incremental pipeline but fails when I try to use outputMode "update"

def upsertToDelta(microBatchOutputDF, batchId): microBatchOutputDF.createOrReplaceTempView("updates")   microBatchOutputDF._jdf.sparkSession().sql(""" MERGE INTO old o USING updates u ON u.id = o.id WHEN MATCHED THEN UPDATE SE...

  • 4202 Views
  • 9 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Delta table/file version is too old. Please try to upgrade it as described here https://docs.microsoft.com/en-us/azure/databricks/delta/versioning​

  • 3 kudos
8 More Replies
Disney
by New Contributor II
  • 1248 Views
  • 1 replies
  • 5 kudos

Resolved! We have hundreds of ETL process (Informatica) with a lot of logic pulling various data from applications into a relational db (Target DB). Can we use Delta Lake as the Target DB?

Hi DB Support,Can we use DB's Delta Lake as our Target DB? Here's our situation...We have hundreds of ETL jobs pulling from these Sources. (SAP, Siebel/Oracle, Cognos, Postgres) .Our ETL Process has all of the logic and our Target DB is an MPP syst...

  • 1248 Views
  • 1 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Hi yes you can the best is to create sql endpoint in premium workspace and just write to delta lake as to sql. This is community forum not support. You can contact databricks via https://databricks.com/company/contact or via AWS, Azure if you have su...

  • 5 kudos
Bilal1
by New Contributor III
  • 1617 Views
  • 1 replies
  • 0 kudos

Invalid string or buffer length (0)

I've created a simple query reading all columns from a table. I've published the results on a dashboard, however I receive the following error. I cannot seem to find any info online on how to resolve this issueAny ideas?

image
  • 1617 Views
  • 1 replies
  • 0 kudos
Latest Reply
Bilal1
New Contributor III
  • 0 kudos

Resolved. I forgot to save the query.

  • 0 kudos
DoD
by New Contributor III
  • 1676 Views
  • 2 replies
  • 1 kudos

Resolved! Why are R scripts inside of Databricks notebooks creating writeLines errors?

I recently posted this in Stack Overflow. I'm using R in Databricks. R Studio runs fine and executes from the Databricks cluster. I would like to transition from R Studio to notebooks. When I start the cluster, R seems to run fine from notebooks. ...

  • 1676 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Paul Evangelista​ - Thank you for letting us know. You did great!Would you be happy to mark your answer as best so that others can find your solution more easily?

  • 1 kudos
1 More Replies
wyzer
by Contributor II
  • 7456 Views
  • 2 replies
  • 2 kudos

Why database/table names are in lower case ?

Hello,When I run this code :CREATE DATABASE BackOfficeI see the database like this :backofficeWhy everything is in lower case ?Is it possible to configure Databricks in order to keep the real name ?Thanks.

  • 7456 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

It is managed by hive metastore as you can put it in different databases is saver this way as some database are Case Sensitive and some not (you can easily test it with standard WHERE syntax).Probably you could change it with some hive settings but i...

  • 2 kudos
1 More Replies
hetadesai
by New Contributor II
  • 5556 Views
  • 1 replies
  • 3 kudos

How to download zip file from SFTP location and put that file into Azure Data Lake and unzip there ?

I have zip file on SFTP location. I want to copy that file from SFTP location and put it into Azure Data lake and want to unzip there using spark notebook. Please help me to solve this.

  • 5556 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

I would go with @Kaniz Fatma​ approach and download data in Data Factory and after is downloaded on success trigger databricks spark notebook. With spark you can read also compressed data so maybe you will not need to do even separate unzip.

  • 3 kudos
BorislavBlagoev
by Valued Contributor III
  • 19129 Views
  • 16 replies
  • 10 kudos

Resolved! Error in databricks-sql-connector

from databricks import sql hostname = '<name>.databricks.com' http_path = '/sql/1.0/endpoints/<endpoint_id>' access_token = '<personal_token>' connection = sql.connect(server_hostname=hostname, http_path=http_path, access_token=access_token) cu...

  • 19129 Views
  • 16 replies
  • 10 kudos
Latest Reply
NiallEgan__Data
New Contributor III
  • 10 kudos

Hi @Borislav Blagoev​ ,Thanks very much for taking the time to collect these logs.The problem here (as indicated by the `IpAclValidation` message) is that IP allow listing (enabled for your workspace) will not allow arbitrary connections from Spark c...

  • 10 kudos
15 More Replies
SajiD
by New Contributor
  • 1177 Views
  • 0 replies
  • 0 kudos

Snowflake Connector for Databricks

Hi everyone, I am working with Databricks Notebooks and I am facing an issue with snowflake connector, I wanted to use DDL/DML with snowflake connector. Can someone please help me out with this, Thanks in advance !!

  • 1177 Views
  • 0 replies
  • 0 kudos
Olli
by New Contributor III
  • 2814 Views
  • 3 replies
  • 0 kudos

Resolved! Autoloader streams fail unable to locate checkpoint/metadata or metadata/rocksdb/SSTs/sst files after interruption from cluster termination

I have a pipeline with + 20 streams running based on autoloader. The pipeline crashed and after the crash I'm unable to start the streams and they fail with one of the following messages:1): The metadata file in the streaming source checkpoint direct...

  • 2814 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Olli Tiihonen​  - Thanks for letting us know. I'm glad you were able to get to the bottom of things.

  • 0 kudos
2 More Replies
NextIT
by New Contributor
  • 637 Views
  • 0 replies
  • 0 kudos

www.nextitvision.com

Online IT Training: ERP/SAP Online Training | JAVA Online Training | C++Online Training | ORACLE Online Training | Online Python Training | Machine Learning Training. If you Need more Details and Information Regarding IT Online Training. Please Visi...

  • 637 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels