cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AndLuffman
by New Contributor II
  • 1274 Views
  • 5 replies
  • 1 kudos

QRY Results incorrect but Exported data is OK

I ran a query "Select * from fact_Orders".     This presented a lot of garbage,  The correct column headers, but the contents were extremely random, e.g.  blanks in the key column, VAT rates of 12282384234E-45  . When I export to CSV , it presents fi...

  • 1274 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @AndLuffman, The issue you're experiencing might be related to the limitations of the Databricks interface when dealing with large datasets with many columns. The interface has a limit on the number of rows it can display at once, which can lead t...

  • 1 kudos
4 More Replies
Erik_L
by Contributor II
  • 820 Views
  • 2 replies
  • 1 kudos

Structured Streaming from TimescaleDB?

I realize that the best practice would be to integrate our service with Kafka as a streaming source for Databricks, but given that the service already stores data into TimescaleDB, how can I stream data from TimescaleDB into DBX? Debezium doesn't wor...

  • 820 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Erik_L, Currently, there is no direct way to stream data from TimescaleDB into Databricks. However, there are a couple of ways you can approach this: 1. **Kafka Integration**: You can integrate Kafka into your service for consuming data. Kafka i...

  • 1 kudos
1 More Replies
kg6ka
by New Contributor
  • 1183 Views
  • 2 replies
  • 1 kudos

Is it possible to do without the github token and integration?

Hey, guys.I have a question, so, I have databricks jobs in workflow that are linked to my databricks repo, which contains the necessary scripts for one or another job. That is, the job is linked to the databricks repo.The main code is developed in gi...

  • 1183 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @kg6ka, Based on the provided information, you are trying to push code from Github to Databricks repo using Databricks REST API. However, the error message you are getting indicates that you are missing Git provider credentials.  According to the...

  • 1 kudos
1 More Replies
romangehrn
by New Contributor
  • 334 Views
  • 0 replies
  • 0 kudos

speed issue DBR 13+ for R

I got a notebook running on DBR 12.2 with the following R code: install.packages("microbenchmark") install.packages("furrr") library(microbenchmark) library(tidyverse) # example tibble df_test <- tibble(id = 1:100000, street_raw = rep("Bahnhofs...

Data Engineering
DBR 13
performance slow
R
speed error
  • 334 Views
  • 0 replies
  • 0 kudos
210573
by New Contributor
  • 1610 Views
  • 4 replies
  • 2 kudos

Unable to stream from google pub/sub

I am trying to run below for subscribing to a pubsub but this code is throwing this exception java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/DataSourceV2I have tried using all versions of https://mvnrepository.com/artifact/com.google...

  • 1610 Views
  • 4 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi @210573 Databricks now start supporting pub/sub streaming natively now you can start using pubsub streaming for your use case for more info visit below official URL -PUB/SUB with Databricks 

  • 2 kudos
3 More Replies
vonjack
by New Contributor II
  • 975 Views
  • 3 replies
  • 1 kudos

Resolved! How to unload a Jar for UDF without restart spark context?

In the scala notebook of databricks, I created a temporary function with a certain Jar and class name. Then I want to update the Jar. But without restart the context, I can not reload the new Jar, the temporary function always reuses the old classes....

  • 975 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @vonjack, I'm sorry, but based on the information provided, there isn't a direct way to refresh the classes of a temporary function in Databricks without restarting the context.  Databricks does support updating JARs, including replacing a default...

  • 1 kudos
2 More Replies
sparkrookie
by New Contributor II
  • 957 Views
  • 2 replies
  • 0 kudos

Structured Streaming Delta Table - Reading and writing from same table

Hi I have a structured streaming job that reads from a delta table "A" and pushes to another delta table "B".A Schema - group_key, id, timestamp, valueB Schema - group_key, watermark_timestamp, derived_valueOne requirement is that i need to get the m...

  • 957 Views
  • 2 replies
  • 0 kudos
Latest Reply
KarenGalvez
New Contributor II
  • 0 kudos

Navigating the intricacies of structured streaming and Delta table operations on the same platform has been a stimulating yet demanding task. The community at Databricks has been instrumental in clarifying nuances. As I delve deeper, I'm reminded of ...

  • 0 kudos
1 More Replies
Michael_Galli
by Contributor II
  • 387 Views
  • 1 replies
  • 0 kudos

Resolved! Migrate von non-CDC sources to CDC. Downstream consequences?

I have a question regarding streaming with CDC.We currently have a delta table where CDC is not yet enabled, and its the source for other following streams that read from that table.To catch the changes for a new usecase, we need to enable CDC on tha...

  • 387 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Michael_Galli, Enabling CDC on your Delta table should not directly affect your existing downstream streams. However, there are some considerations you should keep in mind: • If the schema of the Delta table changes after a streaming read has beg...

  • 0 kudos
shraddharane
by New Contributor
  • 1421 Views
  • 2 replies
  • 0 kudos

Migrating legacy SSAS cube to databricks

We have SQL database. Database is designed in star schema. We are migrating data from SQL to databricks. There are cubes designed using SSAS. These cubes are used for end users in excel for analysis purpose. We are now looking for solution for:1) Can...

  • 1421 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @shraddharane , 1) Can cubes be migrated? No, SSAS cubes cannot be directly migrated to Databricks. Databricks do not support the concept of multidimensional cubes like SSAS. Databricks is a Lakehouse architecture built on the foundation of Delta ...

  • 0 kudos
1 More Replies
140015
by New Contributor III
  • 967 Views
  • 3 replies
  • 1 kudos

Resolved! Using DLT pipeline with non-incremental data

Hi,I would like to know what you think about using the Delta Live Tables when the source for this pipeline is not incremental. What I mean by that is suppose that the data provider creates for me a new folder with files each time it has update to the...

  • 967 Views
  • 3 replies
  • 1 kudos
Latest Reply
Joe_Suarez
New Contributor III
  • 1 kudos

When dealing with B2B data building, the process of updating and managing your data can present unique challenges. Since your data updates involve new folders with files and you need to process the entire new folder, the concept of incremental proces...

  • 1 kudos
2 More Replies
GNarain
by New Contributor II
  • 3136 Views
  • 12 replies
  • 5 kudos

Resolved! Is there api call to set "Table access control" workspace config ?

Is there api call to set "Table access control" workspace config ?

  • 3136 Views
  • 12 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @GNarain,  Here is an example of the API call:  Could you try and let us know?   POST /api/2.0/workspace/update{ "workspaceAccessControlEnabled": true} This API call will enable table access control for your workspace. You can make this API call u...

  • 5 kudos
11 More Replies
Eldar_Dragomir
by New Contributor II
  • 814 Views
  • 1 replies
  • 2 kudos

Resolved! Reprocessing the data with Auto Loader

Could you please provide me an idea how I can start reprocessing of my data? Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline. I want to reprocess the data + continue receive new data. What t...

  • 814 Views
  • 1 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 2 kudos

@Eldar_Dragomir In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-...

  • 2 kudos
anarad429
by New Contributor
  • 720 Views
  • 1 replies
  • 1 kudos

Resolved! Unity Catalog + Reading variable from external notebook

I am trying to run a notebook which reads some of its variables from and external notebook (I used %run command for that purpose), but it keeps giving me error that these variables are not defined. These sequences of notebooks run perfectly fine on a...

  • 720 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 1 kudos

I think the issue here is the variable is not created until a value is assigned to it. So, you may need to assign a value to get_sql_schema

  • 1 kudos
NathanLaw
by New Contributor III
  • 421 Views
  • 1 replies
  • 0 kudos

CPU and GPU Elapse Runtimes

I have to 2 questions about elapsed job runtimes. The same Scoring notebook is run 3 times as 3 Jobs.   The jobs are identical, same PetaStorm code, CPU cluster config ( not Spot cluster) and data but have varying elapsed runtimes.   Elapsed runtimes...

  • 421 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @NathanLaw, Could you please confirm, if you have set any parameters for the best model? Is this stop after running some epochs if there is no improvement in the model performance? 

  • 0 kudos
Sanjay_AMP
by New Contributor II
  • 412 Views
  • 1 replies
  • 1 kudos

Deployment-ready sample source-code for Delta Live Table & Autoloader

Hi all,We are planning to develop an Autoloader based DLT Pipeline that needs to beDeployable via a CI/CD PipelineObservableCan somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ...

  • 412 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Valued Contributor
  • 1 kudos

Hi @Sanjay_AMP Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.• Python code example: - Define a table called "customers" that reads data from a CSV file in cloud object storage. - Define a...

  • 1 kudos
Labels
Top Kudoed Authors